Team Information

Name: team_bbg

Size: 2

Member Name Member NetId
Pushpit Saxena pushpit2
Venslaus Prakash Arokiaraj vpa2

Introduction

This project will mainly focus on studying different factors that play statistically significant role in influencing Life Expectancy. We will be focusing on a wide variety of factors such as economic factors, social factors, health services factors (like immunizzation levels), mortality rate and various other health related factors that influence life expectancy.

Dataset Information

Based on the description of the dataset on kaggle, the Global Health Observatory(GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries. The datasets are made available to public for the purpose of health data analysis. This datset was collected from WHO and United Nations website and then the individual data files have been combined into a single data set (read more here)

Description

The dataset we will be using for this project is Life Expectancy data that can be found at Life Expectancy (WHO). The dataset has 22 variables and 2939 observations which needs some cleanup. (Note: we have also provided the dataset as part of the .zip [lifeExpectancyData] that we have uploaded along with the project).

Data fields

Following are some of the important variables used in this dataset:

  • Country (String): Country of observation

  • Year (Integer): Year of observation

  • Status (String): Whether the country of observation is developed or developing.

  • Life expectancy (Decimal): Life expectancy in age

  • Adult Mortality (Integer): Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population)

  • Infant deaths (Integer): Number of Infant Deaths per 1000 population

  • Alcohol (Decimal): Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol)

  • Percentage Expenditure (Decimal): Expenditure on health as a percentage of Gross Domestic Product per capita(%)

  • Hepatitis B (Int): Hepatitis B (HepB) immunization coverage among 1-year-olds (%)

  • Measles (Int): Measles - number of reported cases per 1000 population

  • BMI (Decimal): Average Body Mass Index of entire population

  • Under-five deaths (Int): Number of under-five deaths per 1000 population

  • Polio (Int): Polio (Pol3) immunization coverage among 1-year-olds (%)

  • Total expenditure (Decimal): General government expenditure on health as a percentage of total government expenditure (%)

  • Diphtheria (Int): Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%)

  • HIV/AIDS (Decimal): Deaths per 1 000 live births HIV/AIDS (0-4 years)

  • GDP (Decimal): Gross Domestic Product per capita (in USD)

  • Population (Int): Population of the country

Project Goals

  • From a research standpoint, one of the primary reason for us to pick this dataset is the size as well as variety of predictors that are available. We believe that this dataset is perfect for us to practice and implement majority of the techniques we have learned as part of the course and get a hands-on experience on a real life dataset. As part of this project (see Methods and Results), we tried to utilize the knowledge and techniques we have learned as part of this course. Some of the things we will explore are:
    • Fit additive & interactive model
    • Utilize step-wise search (AIC/BIC)
    • Explore transformations (LOG) of predictors/response
    • Explore models with polynomial terms
    • Utilize statistical significance tests like \(t\)-test and \(F\)-test
    • Utilize and analyze diagnostic plots like Residuals v. Fitted & Normal Q-Q plots.
    • Utilize performance metrics like \(R^2\) and \(\text{Test-RMSE}\)
    • Explore outlier detection
  • From larger perspective of data exploration and using data science to address real world issues, this dataset also gives us an opportunity to try and answer some of the most important questions the human race is facing, like various factors affecting the longevity of life. As we briefly mentioned in our project description, we are interested in determining different factors which contributes to lower the value of life expectancy. At the end, we would like to answer some of the important factors that we find as part of the final model. Also, we will present some visualizations which will help anybody better understand effects of some factors on life expectancy as well as overall trend of life expectancy across the world over the years.

Methods and Results

Note: We have grouped the Methods and Results sections together, as it is more convenient to demostrate the flow of our research to build the model. We do also have a separate Results section showing the combined results of all the model we experiment with. Also, that section show detailed information and plots for final model.

Data Cleaning:

Loading the Data:

Changing the names of the fields to follow a more consistent pattern(snake-case):

Snippet of the raw dataset:

## Warning: `as.tibble()` is deprecated as of tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## # A tibble: 2,938 x 24
##    country year  status life_expectancy adult_mortality infant_deaths alcohol
##    <fct>   <fct> <fct>            <dbl>           <int>         <int>   <dbl>
##  1 Afghan… 2015  Devel…            65               263            62    0.01
##  2 Afghan… 2014  Devel…            59.9             271            64    0.01
##  3 Afghan… 2013  Devel…            59.9             268            66    0.01
##  4 Afghan… 2012  Devel…            59.5             272            69    0.01
##  5 Afghan… 2011  Devel…            59.2             275            71    0.01
##  6 Afghan… 2010  Devel…            58.8             279            74    0.01
##  7 Afghan… 2009  Devel…            58.6             281            77    0.01
##  8 Afghan… 2008  Devel…            58.1             287            80    0.03
##  9 Afghan… 2007  Devel…            57.5             295            82    0.02
## 10 Afghan… 2006  Devel…            57.3             295            84    0.03
## # … with 2,928 more rows, and 17 more variables: percentage_expenditure <dbl>,
## #   hepatitis_b <int>, measles <int>, bmi <dbl>, under_five_deaths <int>,
## #   polio <int>, total_expenditure <dbl>, diphtheria <int>, hiv_aids <dbl>,
## #   gdp <dbl>, population <dbl>, thinness_1_19_years <dbl>,
## #   thinness_5_9_years <dbl>, income_composition_of_resources <dbl>,
## #   schooling <dbl>, continent <fct>, region <fct>

Summary of numeric fields:

Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
life_expectancy 36.30 63.10 72.10 69.22 75.70 89.00 10.0
adult_mortality 1.00 74.00 144.00 164.80 228.00 723.00 10.0
infant_deaths 0.00 0.00 3.00 30.30 22.00 1800.00 0.0
alcohol 0.01 0.88 3.76 4.60 7.70 17.87 194.0
percentage_expenditure 0.00 4.69 64.91 738.25 441.53 19479.91 0.0
hepatitis_b 1.00 77.00 92.00 80.94 97.00 99.00 553.0
measles 0.00 0.00 17.00 2419.59 360.25 212183.00 0.0
bmi 1.00 19.30 43.50 38.32 56.20 87.30 34.0
under_five_deaths 0.00 0.00 4.00 42.04 28.00 2500.00 0.0
polio 3.00 78.00 93.00 82.55 97.00 99.00 19.0
total_expenditure 0.37 4.26 5.76 5.94 7.49 17.60 226.0
diphtheria 2.00 78.00 93.00 82.32 97.00 99.00 19.0
hiv_aids 0.10 0.10 0.10 1.74 0.80 50.60 0.1
gdp 1.68 463.94 1766.95 7483.16 5910.81 119172.74 448.0
population 34.00 195793.25 1386542.00 12753375.12 7420359.00 1293859294.00 652.0
thinness_1_19_years 0.10 1.60 3.30 4.84 7.20 27.70 34.0
thinness_5_9_years 0.10 1.50 3.30 4.87 7.20 28.60 34.0
income_composition_of_resources 0.00 0.49 0.68 0.63 0.78 0.95 167.0
schooling 0.00 10.10 12.30 11.99 14.30 20.70 163.0

We can see that only 10 observations have missing values for the response field life_expectancy, so we drop those 10 observations as dropping them will not make much difference to the models that we will try.

## [1] 2928

There are still 1279 observations with some missing values. We will use the mean of the value for a given country to impute some of these values:

## [1] 800

Still there are some observations with missing values. Next we will use the mean of the values for a given region in a particular year to impute some of these missing values:

## [1] 0

Finally, we have imputed all the values and our final dataset has 2928 observations

Data exploration and visualization

We have presented below some statistics and plots that helped us in our understanding of the dataset. Some of these plots also revealed some interesting pattern in the dataset.

Statistics (by region)

Region #Records Avg. Life Expectancy Avg. Infant Deaths Avg. Adult Deaths
East Asia & Pacific 422 71.34231 25.265403 137.62260
Europe & Central Asia 770 75.95456 2.724675 109.26432
Latin America & Caribbean 498 73.07319 7.339357 135.32661
Middle East & North Africa 320 73.16312 11.281250 105.65625
North America 32 79.87500 14.093750 61.40625
South Asia 128 67.37422 250.039062 164.50781
Sub-Saharan Africa 768 57.08685 47.593750 283.07812

Statistics (by continent)

Continent #Records Avg. Life Expectancy Avg. Infant Deaths Avg. Adult Deaths
Africa 864 57.80 44.246528 266.57176
Americas 530 73.90 7.747170 130.84659
Asia 752 72.55 60.875000 133.43750
Europe 626 77.80 1.172524 98.01282
Oceania 166 69.40 1.120482 135.08750

Looking at the statistics and plot above, we can clearly see that countries in the African continent has some of the lowest Life expectancy values among all the other countries. One thing we also noticed that there are less observations for North American countries.

We can that on average the life_expectancy is improving over the years.

Again, we can see that African countries have some of the lowest life_expectancy values over the years and European countries have some of the highest life_expectancy values.

We can see that countries with lower gdp generally have lower life_expectancy.

We can see that countries with high HIV Aids cases generally have lower life_expectancy, on average.

We can see that countries with higher infant mortality rate have lower life_expectancy, on average.

  • Correlation matrix for the numerical predictors we have in the datset:

We believe the visualizations/data analysis above gave us enough insights about the dataset we are dealing with and we can start with model building.


Model Building

Splitting the data in training and test set (90% training, 10% hold out test set):

Ignoring all the categorical variables for now (except status, we have fitted models using some of these categorical variables but couldn’t get better results, code can be seen in Appendix)

We started with fitting a full Additive model (with all the numerical predictor and status). This will provide us with a good baseline model to do simple as well as more nuanced feature selections later

  • Summary of the full additive model
## 
## Call:
## lm(formula = life_expectancy ~ ., data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.6482  -2.2808  -0.1263   2.2784  17.4919 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.521e+01  6.664e-01  82.838  < 2e-16 ***
## statusDeveloping                -1.243e+00  2.834e-01  -4.386 1.20e-05 ***
## adult_mortality                 -1.818e-02  8.274e-04 -21.975  < 2e-16 ***
## infant_deaths                    9.078e-02  8.753e-03  10.371  < 2e-16 ***
## alcohol                          3.047e-02  2.696e-02   1.130   0.2585    
## percentage_expenditure           1.562e-04  7.757e-05   2.014   0.0441 *  
## hepatitis_b                     -2.051e-03  4.082e-03  -0.502   0.6155    
## measles                         -1.497e-05  7.809e-06  -1.917   0.0553 .  
## bmi                              3.719e-02  5.170e-03   7.192 8.27e-13 ***
## under_five_deaths               -6.775e-02  6.406e-03 -10.575  < 2e-16 ***
## polio                            2.547e-02  4.731e-03   5.384 7.92e-08 ***
## total_expenditure                1.127e-02  3.447e-02   0.327   0.7437    
## diphtheria                       3.305e-02  5.048e-03   6.547 7.04e-11 ***
## hiv_aids                        -4.746e-01  1.777e-02 -26.709  < 2e-16 ***
## gdp                              2.965e-05  1.195e-05   2.481   0.0132 *  
## population                       6.525e-10  1.900e-09   0.343   0.7313    
## thinness_1_19_years             -7.204e-02  4.970e-02  -1.449   0.1473    
## thinness_5_9_years              -7.721e-03  4.899e-02  -0.158   0.8748    
## income_composition_of_resources  6.297e+00  6.552e-01   9.611  < 2e-16 ***
## schooling                        7.378e-01  4.511e-02  16.355  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.959 on 2615 degrees of freedom
## Multiple R-squared:  0.828,  Adjusted R-squared:  0.8267 
## F-statistic: 662.3 on 19 and 2615 DF,  p-value: < 2.2e-16
  • Diagnostics plots for full additive model

  • Test \(\mathbf{\text{RMSE}} = 3.7453062\)
  • \(R^2 = 0.8279511\)
  • We can see that this model show that there are some non-significant predictors in the full model, for e.g. for alcohol, if we use t-test for significance:
    • Null: \(H_0:\) \(\beta_{alcohol} = 0\)
    • Alternative: \(H_0:\) \(\beta_{alcohol} \neq 0\)
    • Test Statistics: \(1.1301202\)
    • P-value: \(0.2585292\)
    • Decision: Fail to reject null
    • Conclusion: alcohol does not have significant linear relationship with life_expectancy
  • Also, we can see from the diagnostic plots that both equal variance assumption and normality assumption are suspect.

So we started with simple (not recommended) method of removing some of the least significant predictors. Also, there seems to be high collinearity between infant_deaths and under_5_deaths (check vif below and correlation plot shown earlier).

##                          status                 adult_mortality 
##                        1.949467                        1.756053 
##                   infant_deaths                         alcohol 
##                      165.233443                        1.982791 
##          percentage_expenditure                     hepatitis_b 
##                        4.085965                        1.691643 
##                         measles                             bmi 
##                        1.372574                        1.795997 
##               under_five_deaths                           polio 
##                      165.237301                        2.038419 
##               total_expenditure                      diphtheria 
##                        1.202812                        2.389284 
##                        hiv_aids                             gdp 
##                        1.396273                        4.412362 
##                      population             thinness_1_19_years 
##                        1.555639                        8.034095 
##              thinness_5_9_years income_composition_of_resources 
##                        8.115884                        3.156427 
##                       schooling 
##                        3.775685

So we removed some of the least significant predictor and kept infant_deaths

  • Summary (Significant additive model sig_additive_model):
## 
## Call:
## lm(formula = life_expectancy ~ adult_mortality + infant_deaths + 
##     bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources * 
##     status + schooling, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.9816  -2.2703  -0.0947   2.4075  18.9792 
## 
## Coefficients:
##                                                    Estimate Std. Error t value
## (Intercept)                                       4.604e+01  3.066e+00  15.017
## adult_mortality                                  -1.860e-02  8.439e-04 -22.047
## infant_deaths                                    -2.677e-03  7.299e-04  -3.667
## bmi                                               4.484e-02  4.994e-03   8.979
## diphtheria                                        5.473e-02  3.811e-03  14.360
## hiv_aids                                         -4.908e-01  1.806e-02 -27.176
## gdp                                               4.063e-05  7.465e-06   5.442
## income_composition_of_resources                   1.669e+01  3.706e+00   4.505
## statusDeveloping                                  6.624e+00  3.040e+00   2.179
## schooling                                         7.792e-01  4.482e-02  17.385
## income_composition_of_resources:statusDeveloping -9.567e+00  3.631e+00  -2.635
##                                                  Pr(>|t|)    
## (Intercept)                                       < 2e-16 ***
## adult_mortality                                   < 2e-16 ***
## infant_deaths                                     0.00025 ***
## bmi                                               < 2e-16 ***
## diphtheria                                        < 2e-16 ***
## hiv_aids                                          < 2e-16 ***
## gdp                                              5.74e-08 ***
## income_composition_of_resources                  6.94e-06 ***
## statusDeveloping                                  0.02944 *  
## schooling                                         < 2e-16 ***
## income_composition_of_resources:statusDeveloping  0.00846 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.076 on 2624 degrees of freedom
## Multiple R-squared:  0.817,  Adjusted R-squared:  0.8163 
## F-statistic:  1171 on 10 and 2624 DF,  p-value: < 2.2e-16
  • Diagnostic plot (Significant additive model sig_additive_model):

  • Test-\(\mathbf{\text{RMSE}} = 3.9150261\).
  • \(R^2 = 0.8170003\).

  • Comparison with full_additive_model:

## Analysis of Variance Table
## 
## Model 1: life_expectancy ~ adult_mortality + infant_deaths + bmi + diphtheria + 
##     hiv_aids + gdp + income_composition_of_resources * status + 
##     schooling
## Model 2: life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1   2624 43593                                  
## 2   2615 40985  9    2608.7 18.494 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model Model_Var R_2 Test_RMSE
Full Additive Model full_additve_model 0.8279511 3.745306
Significant Additive Model sig_additive_model 0.8170003 3.915026

  • We can see that this procedure doesn’t help us. Both \(R^2\) and \(\text{Test-RMSE}\) become worse. Assumptions are still suspect. Also, based on the \(F\)-test, smaller model (sig_additive_model) is rejected.

Next we tried a pair-wise interactive model (based on the model above sig_additive_model)

  • Fitting the model
  • Summary (Significant Interative Model sig_interactive_model):
## 
## Call:
## lm(formula = life_expectancy ~ (adult_mortality + under_five_deaths + 
##     bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources + 
##     schooling)^2, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.8667  -2.0800  -0.0783   2.0935  14.9944 
## 
## Coefficients:
##                                                     Estimate Std. Error t value
## (Intercept)                                        4.560e+01  1.566e+00  29.119
## adult_mortality                                   -3.429e-03  3.164e-03  -1.084
## under_five_deaths                                  9.951e-03  4.195e-03   2.372
## bmi                                                3.534e-01  3.022e-02  11.692
## diphtheria                                         1.465e-01  1.677e-02   8.733
## hiv_aids                                          -1.186e+00  1.378e-01  -8.607
## gdp                                                4.622e-04  9.661e-05   4.784
## income_composition_of_resources                   -1.458e+01  2.907e+00  -5.016
## schooling                                          1.338e+00  1.875e-01   7.137
## adult_mortality:under_five_deaths                 -1.128e-06  5.387e-06  -0.209
## adult_mortality:bmi                                3.349e-05  5.962e-05   0.562
## adult_mortality:diphtheria                        -9.966e-05  3.239e-05  -3.076
## adult_mortality:hiv_aids                           8.373e-04  6.911e-05  12.115
## adult_mortality:gdp                               -5.788e-08  1.717e-07  -0.337
## adult_mortality:income_composition_of_resources   -7.377e-03  7.182e-03  -1.027
## adult_mortality:schooling                         -8.181e-04  4.551e-04  -1.798
## under_five_deaths:bmi                             -3.054e-04  1.111e-04  -2.749
## under_five_deaths:diphtheria                      -1.373e-05  2.680e-05  -0.512
## under_five_deaths:hiv_aids                        -6.917e-04  3.445e-04  -2.008
## under_five_deaths:gdp                             -2.036e-07  5.638e-07  -0.361
## under_five_deaths:income_composition_of_resources  1.716e-02  6.180e-03   2.777
## under_five_deaths:schooling                       -1.339e-03  5.273e-04  -2.539
## bmi:diphtheria                                    -1.487e-03  2.222e-04  -6.691
## bmi:hiv_aids                                       4.319e-03  1.999e-03   2.161
## bmi:gdp                                           -2.412e-07  3.413e-07  -0.707
## bmi:income_composition_of_resources                1.020e-02  3.377e-02   0.302
## bmi:schooling                                     -1.633e-02  2.264e-03  -7.214
## diphtheria:hiv_aids                               -9.062e-04  9.076e-04  -0.999
## diphtheria:gdp                                     2.030e-07  5.526e-07   0.367
## diphtheria:income_composition_of_resources         6.137e-02  2.334e-02   2.630
## diphtheria:schooling                              -6.664e-03  1.693e-03  -3.936
## hiv_aids:gdp                                      -3.389e-05  9.961e-06  -3.402
## hiv_aids:income_composition_of_resources           2.017e+00  3.418e-01   5.901
## hiv_aids:schooling                                -5.147e-02  1.627e-02  -3.164
## gdp:income_composition_of_resources               -4.159e-04  1.052e-04  -3.954
## gdp:schooling                                     -3.872e-06  3.694e-06  -1.048
## income_composition_of_resources:schooling          1.435e+00  1.112e-01  12.909
##                                                   Pr(>|t|)    
## (Intercept)                                        < 2e-16 ***
## adult_mortality                                   0.278493    
## under_five_deaths                                 0.017768 *  
## bmi                                                < 2e-16 ***
## diphtheria                                         < 2e-16 ***
## hiv_aids                                           < 2e-16 ***
## gdp                                               1.81e-06 ***
## income_composition_of_resources                   5.64e-07 ***
## schooling                                         1.23e-12 ***
## adult_mortality:under_five_deaths                 0.834099    
## adult_mortality:bmi                               0.574404    
## adult_mortality:diphtheria                        0.002117 ** 
## adult_mortality:hiv_aids                           < 2e-16 ***
## adult_mortality:gdp                               0.736077    
## adult_mortality:income_composition_of_resources   0.304431    
## adult_mortality:schooling                         0.072369 .  
## under_five_deaths:bmi                             0.006011 ** 
## under_five_deaths:diphtheria                      0.608422    
## under_five_deaths:hiv_aids                        0.044778 *  
## under_five_deaths:gdp                             0.718055    
## under_five_deaths:income_composition_of_resources 0.005521 ** 
## under_five_deaths:schooling                       0.011162 *  
## bmi:diphtheria                                    2.70e-11 ***
## bmi:hiv_aids                                      0.030771 *  
## bmi:gdp                                           0.479841    
## bmi:income_composition_of_resources               0.762627    
## bmi:schooling                                     7.12e-13 ***
## diphtheria:hiv_aids                               0.318122    
## diphtheria:gdp                                    0.713441    
## diphtheria:income_composition_of_resources        0.008597 ** 
## diphtheria:schooling                              8.49e-05 ***
## hiv_aids:gdp                                      0.000679 ***
## hiv_aids:income_composition_of_resources          4.08e-09 ***
## hiv_aids:schooling                                0.001573 ** 
## gdp:income_composition_of_resources               7.88e-05 ***
## gdp:schooling                                     0.294661    
## income_composition_of_resources:schooling          < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.564 on 2598 degrees of freedom
## Multiple R-squared:  0.8615, Adjusted R-squared:  0.8596 
## F-statistic: 448.9 on 36 and 2598 DF,  p-value: < 2.2e-16
  • Diagnostic Plots (Significant Interative Model sig_interactive_model):

  • Test-\(\mathbf{\text{RMSE}} = 3.3629616\)
  • \(R^2 = 0.861496\)

  • Comparision with Full Additive model (full_additive_model):

## Analysis of Variance Table
## 
## Model 1: life_expectancy ~ (adult_mortality + under_five_deaths + bmi + 
##     diphtheria + hiv_aids + gdp + income_composition_of_resources + 
##     schooling)^2
## Model 2: life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
##   Res.Df   RSS  Df Sum of Sq      F    Pr(>F)    
## 1   2598 32994                                   
## 2   2615 40985 -17   -7990.9 37.013 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model Model_Var R_2 Test_RMSE
Full Additive Model full_additve_model 0.8279511 3.745306
Significant Interactive Model sig_interative_model 0.8614960 3.362962

  • This model has improved \(R^2\) and \(\text{Test-RMSE}\) compared to full_additive_model, but we can see that assumptions are still suspect and \(F\)-test is still rejecting. But this can be a good candidate model, if we don’t find any other model which is better performing and adhere to assumptions better.

    Note: We have tried AIC/BIC stepwise search with some of these interactive models and do get better \(R^2\) and \(\text{Test-RMSE}\) etc. but training time is extremely long, and due to time & resource constraints we focus most of our time on finding a model with reasonable training time and resource requirements. Please check out Appendix, we do show there one of the AIC model based on an initial fully interactive model.

After the adhoc approaches we described above we tried more formal methods of variable selection. We started with Stepwise backward (AIC).

  • Stepwise search model (AIC):
## [1]   15.000 7263.267
  • Summary of AIC-backward model:
## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + diphtheria + hiv_aids + gdp + thinness_1_19_years + 
##     income_composition_of_resources + schooling, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.5044  -2.2719  -0.1416   2.2613  17.7232 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.532e+01  6.267e-01  88.271  < 2e-16 ***
## statusDeveloping                -1.386e+00  2.571e-01  -5.390 7.66e-08 ***
## adult_mortality                 -1.811e-02  8.229e-04 -22.004  < 2e-16 ***
## infant_deaths                    9.002e-02  8.540e-03  10.541  < 2e-16 ***
## percentage_expenditure           1.644e-04  7.716e-05   2.131 0.033190 *  
## measles                         -1.513e-05  7.761e-06  -1.950 0.051273 .  
## bmi                              3.742e-02  5.115e-03   7.316 3.38e-13 ***
## under_five_deaths               -6.696e-02  6.299e-03 -10.630  < 2e-16 ***
## polio                            2.520e-02  4.664e-03   5.402 7.16e-08 ***
## diphtheria                       3.222e-02  4.638e-03   6.947 4.68e-12 ***
## hiv_aids                        -4.721e-01  1.762e-02 -26.794  < 2e-16 ***
## gdp                              2.871e-05  1.192e-05   2.410 0.016036 *  
## thinness_1_19_years             -8.659e-02  2.387e-02  -3.628 0.000291 ***
## income_composition_of_resources  6.293e+00  6.519e-01   9.653  < 2e-16 ***
## schooling                        7.502e-01  4.357e-02  17.220  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.957 on 2620 degrees of freedom
## Multiple R-squared:  0.8278, Adjusted R-squared:  0.8269 
## F-statistic: 899.8 on 14 and 2620 DF,  p-value: < 2.2e-16
  • Diagnostic plots (AIC - backward model):

  • Test-\(\mathbf{\text{RMSE}} = 3.7476419\)
  • \(R^2 = 0.8278213\)

  • Comparison to Full additive model (full_additive_model):

## Analysis of Variance Table
## 
## Model 1: life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + diphtheria + hiv_aids + gdp + thinness_1_19_years + 
##     income_composition_of_resources + schooling
## Model 2: life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1   2620 41016                           
## 2   2615 40985  5    30.916 0.3945 0.8529
Model Model_Var R_2 Test_RMSE
Full Additive Model full_additve_model 0.8279511 3.745306
AIC Backward (based on full additive model) aic_back_full_additive 0.8278213 3.747642

  • In comparison to full additive model, this stepwise (AIC) model has slightly lower \(R^2\) and \(\text{Test-RMSE}\) and almost similar looking Residuals v. Fitted and Normal Q-Q plots, but \(F\)-test failed to reject, so we picked this model to experiment with some transformation to see if we can improve the performance of the model.

    Note: We have also build a model using BIC stepwise search. It was giving us an almost similar result, so we move ahead with this model, but BIC based models and further transformation that we tried can be seen in the Appendix.

Before starting with the application of some transformations, we have also taken a look at how predictors and response are distributed

We first started with adding log transformation to the predictors

## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + log1p(diphtheria) + 
##     log1p(hiv_aids) + log1p(gdp) + log1p(thinness_1_19_years) + 
##     log1p(income_composition_of_resources) + log1p(schooling), 
##     data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.2989  -2.3020  -0.1194   2.3577  14.9968 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            60.62517    1.17019  51.808  < 2e-16 ***
## statusDeveloping                       -2.23092    0.24576  -9.078  < 2e-16 ***
## log1p(adult_mortality)                 -0.70024    0.08079  -8.667  < 2e-16 ***
## log1p(infant_deaths)                    4.65434    0.56784   8.197 3.82e-16 ***
## log1p(percentage_expenditure)           0.18882    0.03140   6.014 2.06e-09 ***
## log1p(measles)                          0.01834    0.03074   0.597 0.550862    
## log1p(bmi)                              0.19048    0.11595   1.643 0.100559    
## log1p(under_five_deaths)               -5.28164    0.54297  -9.727  < 2e-16 ***
## log1p(polio)                            0.58496    0.15150   3.861 0.000116 ***
## log1p(diphtheria)                       0.68393    0.15006   4.558 5.40e-06 ***
## log1p(hiv_aids)                        -5.39946    0.12271 -44.002  < 2e-16 ***
## log1p(gdp)                              0.42038    0.05405   7.777 1.06e-14 ***
## log1p(thinness_1_19_years)             -0.93770    0.13813  -6.789 1.39e-11 ***
## log1p(income_composition_of_resources) 12.15759    0.83225  14.608  < 2e-16 ***
## log1p(schooling)                        1.59942    0.30830   5.188 2.29e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.818 on 2620 degrees of freedom
## Multiple R-squared:  0.8397, Adjusted R-squared:  0.8388 
## F-statistic: 980.3 on 14 and 2620 DF,  p-value: < 2.2e-16

  • Test-\(\mathbf{\text{RMSE}} = 3.6268618\)
  • \(R^2 = 0.8396948\)

  • Comparison with simple non-transformed aic_back_full_additive model:

Model Model_Var R_2 Test_RMSE
AIC Backward (based on full additive model) aic_back_full_additive 0.8278213 3.747642
AIC Backward with predictor-log-transform aic_back_full_additive_model_all_log 0.8396948 3.626862

  • Both \(R^2\) and \(\text{Test-RMSE}\) has improved. Also, Residual v. Fitted and Normal Q-Q plots are looking much better, which indicates that this model adheres to equal variance & normality assumptions much better compared to non-tranformed AIC model.

We have tried various different combination of having log transformation on some predictors and not on other predictors (all those experiments are not included in this report or rmd) and we get one of the following which improved performance.

  • Summary (AIC Model - Some predictors log transformed)
## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.9458  -2.1318  -0.1674   2.1425  13.6522 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      6.297e+01  9.301e-01  67.699  < 2e-16 ***
## statusDeveloping                -1.583e+00  2.413e-01  -6.559 6.52e-11 ***
## log1p(adult_mortality)          -6.387e-01  7.773e-02  -8.217 3.23e-16 ***
## log1p(infant_deaths)             4.100e+00  5.465e-01   7.501 8.60e-14 ***
## log1p(percentage_expenditure)    1.608e-01  3.049e-02   5.274 1.44e-07 ***
## log1p(measles)                   6.821e-03  2.978e-02   0.229  0.81887    
## log1p(bmi)                       1.446e-01  1.116e-01   1.296  0.19526    
## log1p(under_five_deaths)        -4.596e+00  5.232e-01  -8.785  < 2e-16 ***
## log1p(polio)                     1.955e-01  1.494e-01   1.309  0.19081    
## diphtheria                       2.925e-02  3.959e-03   7.387 2.01e-13 ***
## log1p(hiv_aids)                 -5.346e+00  1.174e-01 -45.536  < 2e-16 ***
## gdp                              3.582e-05  6.636e-06   5.398 7.36e-08 ***
## thinness_1_19_years             -7.049e-02  2.065e-02  -3.414  0.00065 ***
## income_composition_of_resources  7.495e+00  6.021e-01  12.448  < 2e-16 ***
## schooling                        4.982e-01  4.147e-02  12.011  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.668 on 2620 degrees of freedom
## Multiple R-squared:  0.8521, Adjusted R-squared:  0.8513 
## F-statistic:  1078 on 14 and 2620 DF,  p-value: < 2.2e-16
  • Diagnostic Plots (AIC Model - Some predictors log transformed)

  • Test-\(\mathbf{\text{RMSE}} = 3.5066411\)
  • \(R^2 = 0.8520542\)

  • Comparison with all predictors LOG transformed aic_back_full_additive_model_all_log model:

Model Model_Var R_2 Test_RMSE
AIC Backward (based on full additive model) aic_back_full_additive 0.8278213 3.747642
AIC Backward with predictor-log-transform aic_back_full_additive_model_all_log 0.8396948 3.626862
AIC Backward with some predictor-log-transform aic_back_full_additive_model_log 0.8520542 3.506641

  • We can see this (with only some predictors log transformed) model performs much better. Both \(R^2\) and \(\text{Test-RMSE}\) improved. Also, both Residuals v. Fitted and Normal Q-Q plots are still looking fine. Hence this model seems to be an improvement over the model with all predictors LOG transformed.

Finally, we tried to add some ploynomial terms for some of the predictors (here also we tried bunch of different models not included in the report) and found one which improves the performance.

  • Summary (AIC Model - Some predictors log transformed and some polynomial terms)
## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2), 
##     data = non_cat_predictor_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.696  -2.005  -0.192   2.010  14.328 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           64.733753   0.964993  67.082  < 2e-16 ***
## statusDeveloping                      -0.085884   0.244199  -0.352  0.72509    
## log1p(adult_mortality)                -0.499649   0.073499  -6.798 1.31e-11 ***
## log1p(infant_deaths)                   3.802834   0.517162   7.353 2.57e-13 ***
## log1p(percentage_expenditure)          0.080119   0.028800   2.782  0.00544 ** 
## log1p(measles)                        -0.047032   0.028103  -1.674  0.09434 .  
## log1p(bmi)                            -0.022992   0.105340  -0.218  0.82724    
## log1p(under_five_deaths)              -4.026331   0.496260  -8.113 7.49e-16 ***
## log1p(polio)                           0.118303   0.140464   0.842  0.39974    
## diphtheria                             0.028601   0.003728   7.671 2.39e-14 ***
## log1p(hiv_aids)                       -4.838593   0.113463 -42.645  < 2e-16 ***
## log1p(gdp)                             0.083404   0.050704   1.645  0.10010    
## thinness_1_19_years                   -0.028591   0.019578  -1.460  0.14430    
## income_composition_of_resources      -17.562294   1.634000 -10.748  < 2e-16 ***
## I(income_composition_of_resources^2)  33.811925   2.025711  16.691  < 2e-16 ***
## schooling                              0.447283   0.107670   4.154 3.37e-05 ***
## I(schooling^2)                        -0.013301   0.005309  -2.505  0.01230 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.447 on 2618 degrees of freedom
## Multiple R-squared:  0.8694, Adjusted R-squared:  0.8686 
## F-statistic:  1089 on 16 and 2618 DF,  p-value: < 2.2e-16
  • Diagnostics (AIC Model - Some predictors log transformed and some polynomial terms)

  • Comparison with AIC Model - Some predictors log transformed (aic_back_full_additive_model_log):

## Analysis of Variance Table
## 
## Model 1: life_expectancy ~ status + log1p(adult_mortality) + log1p(infant_deaths) + 
##     log1p(percentage_expenditure) + log1p(measles) + log1p(bmi) + 
##     log1p(under_five_deaths) + log1p(polio) + diphtheria + log1p(hiv_aids) + 
##     gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## Model 2: life_expectancy ~ status + log1p(adult_mortality) + log1p(infant_deaths) + 
##     log1p(percentage_expenditure) + log1p(measles) + log1p(bmi) + 
##     log1p(under_five_deaths) + log1p(polio) + diphtheria + log1p(hiv_aids) + 
##     log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2)
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1   2620 35243                                  
## 2   2618 31109  2    4133.8 173.94 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model Model_Var R_2 Test_RMSE
AIC Backward with predictor-log-transform aic_back_full_additive_model_all_log 0.8396948 3.626862
AIC Backward with some predictor-log-transform aic_back_full_additive_model_log 0.8520542 3.506641
AIC (with some log and poly terms) aic_back_full_additive_model_log_poly 0.8694072 3.431698

  • We can see that both \(R^2\) and \(\text{Test-RMSE}\) improved. Also, based on the \(F\)-test, null is rejected, hence this bigger model with polynomial terms should be selected.

  • We can also compare this model to the interactive model sig_interative_model where we achieved the high \(R^2\) and low \(\text{Test-RMSE}\):

Model Model_Var R_2 Test_RMSE
Significant Interactive Model sig_interative_model 0.8614960 3.362962
AIC (with some log and poly terms) aic_back_full_additive_model_log_poly 0.8694072 3.431698

We can see this (aic_back_full_additive_model_log_poly) model has slightly better \(R^2\) and slightly underperforming \(\text{Test-RMSE}\) but both Residuals v. Fitted and especially Normal Q-Q plots are looking better. So, we can safely assume that aic_back_full_additive_model_log_poly is one of the best model we experimented with as part of this project.

Note: We do believe that it is definitelt possible to find an even better performing model and infact in Appendix we have shown once such model, but considering time, scope & resource constraints we felt that this model is good enough for the purpose of our project at hand.

By looking at diagnostics plots for all the model we experimented with, one thing we notice that there are some outliers which are affecting our models. So we tried one last thing of removing the outlier and fitting the best model (aic_back_full_additive_model_log_poly) we selected on the cleaned training data.

  • Removing outliers:

## [1] 2635
## [1] 2624

  • Fitting the model on cleaned dataset:
  • Summary (AIC Model - Some predictors log transformed and some polynomial terms) with cleaned dataset:
## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2), 
##     data = life_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.8054  -2.0300  -0.1986   1.9448  14.4270 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           64.671931   0.937805  68.961  < 2e-16 ***
## statusDeveloping                      -0.143085   0.237026  -0.604 0.546116    
## log1p(adult_mortality)                -0.481821   0.071518  -6.737 1.98e-11 ***
## log1p(infant_deaths)                   3.340107   0.503901   6.628 4.11e-11 ***
## log1p(percentage_expenditure)          0.092187   0.027977   3.295 0.000997 ***
## log1p(measles)                        -0.052996   0.027352  -1.938 0.052790 .  
## log1p(bmi)                            -0.006574   0.102268  -0.064 0.948752    
## log1p(under_five_deaths)              -3.559551   0.483729  -7.359 2.48e-13 ***
## log1p(polio)                           0.150127   0.136344   1.101 0.270961    
## diphtheria                             0.027720   0.003620   7.657 2.66e-14 ***
## log1p(hiv_aids)                       -4.859491   0.111102 -43.739  < 2e-16 ***
## log1p(gdp)                             0.071064   0.049244   1.443 0.149116    
## thinness_1_19_years                   -0.035335   0.019039  -1.856 0.063582 .  
## income_composition_of_resources      -17.612469   1.585858 -11.106  < 2e-16 ***
## I(income_composition_of_resources^2)  33.618142   1.966110  17.099  < 2e-16 ***
## schooling                              0.464705   0.104528   4.446 9.13e-06 ***
## I(schooling^2)                        -0.014066   0.005153  -2.730 0.006379 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.345 on 2607 degrees of freedom
## Multiple R-squared:  0.8732, Adjusted R-squared:  0.8724 
## F-statistic:  1122 on 16 and 2607 DF,  p-value: < 2.2e-16
  • Diagnostics (AIC Model - Some predictors log transformed and some polynomial terms) with cleaned dataset:

  • Comparison of models build on full vs cleaned (outlier removed) dataset:
Model Model_Var R_2 Test_RMSE
AIC (with some log and poly terms) aic_back_full_additive_model_log_poly 0.8694072 3.431698
AIC (Log & Poly) - outlier removed aic_back_full_additive_model_log_poly_no_out 0.8731635 3.443616

  • As expected removing outlier does help in improving \(R^2\) but increased the \(\text{Test-RMSE}\). The Residual v. Fitted and Normal Q-Q plots are looking fine (lower tail in Normal Q-Q plot is much shorter). Overall, we do not see much improvement by removing outliers. Also, we felt that the outliers, in our case, seems to be valid observations (as per our understanding of the dataset), hence outlier removal seems not worth the risk.

So, in the end the final model that we picked is the model we got using AIC stepwise backward search on full additive model and then transformed some predictors using LOG transformation and then added some higher degree polynomial terms, i.e. aic_back_full_additive_model_log_poly. We have presented the combined performance results in the Results section below . We have already presented most of the results in this (Methods and Results) section, so in Result section we will again present the diagnostic plots for the final model that we have picked (aic_back_full_additive_model_log_poly).


Results

Performance results for all the models we experimented with as part of this project

Model Model_Var R_2 Test_RMSE
Full Additive Model full_additve_model 0.8279511 3.745306
Significant Additive Model sig_additive_model 0.8170003 3.915026
Significant Interactive Model sig_interative_model 0.8614960 3.362962
AIC Backward (based on full additive model) aic_back_full_additive 0.8278213 3.747642
AIC Backward with predictor-log-transform aic_back_full_additive_model_all_log 0.8396948 3.626862
AIC Backward with some predictor-log-transform aic_back_full_additive_model_log 0.8520542 3.506641
AIC (with some log and poly terms) aic_back_full_additive_model_log_poly 0.8694072 3.431698
AIC (Log & Poly) - outlier removed aic_back_full_additive_model_log_poly_no_out 0.8731635 3.443616

Summary for the best model (aic_back_full_additive_model_log_poly)

## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2), 
##     data = non_cat_predictor_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.696  -2.005  -0.192   2.010  14.328 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           64.733753   0.964993  67.082  < 2e-16 ***
## statusDeveloping                      -0.085884   0.244199  -0.352  0.72509    
## log1p(adult_mortality)                -0.499649   0.073499  -6.798 1.31e-11 ***
## log1p(infant_deaths)                   3.802834   0.517162   7.353 2.57e-13 ***
## log1p(percentage_expenditure)          0.080119   0.028800   2.782  0.00544 ** 
## log1p(measles)                        -0.047032   0.028103  -1.674  0.09434 .  
## log1p(bmi)                            -0.022992   0.105340  -0.218  0.82724    
## log1p(under_five_deaths)              -4.026331   0.496260  -8.113 7.49e-16 ***
## log1p(polio)                           0.118303   0.140464   0.842  0.39974    
## diphtheria                             0.028601   0.003728   7.671 2.39e-14 ***
## log1p(hiv_aids)                       -4.838593   0.113463 -42.645  < 2e-16 ***
## log1p(gdp)                             0.083404   0.050704   1.645  0.10010    
## thinness_1_19_years                   -0.028591   0.019578  -1.460  0.14430    
## income_composition_of_resources      -17.562294   1.634000 -10.748  < 2e-16 ***
## I(income_composition_of_resources^2)  33.811925   2.025711  16.691  < 2e-16 ***
## schooling                              0.447283   0.107670   4.154 3.37e-05 ***
## I(schooling^2)                        -0.013301   0.005309  -2.505  0.01230 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.447 on 2618 degrees of freedom
## Multiple R-squared:  0.8694, Adjusted R-squared:  0.8686 
## F-statistic:  1089 on 16 and 2618 DF,  p-value: < 2.2e-16

Diagnostic plots for the best model that we have picked (aic_back_full_additive_model_log_poly)

Performance Metrics for best model


Discussion

Model Building and Results

Data Analysis and exploration

Conclusion

Appendix

Experimental model where we have tried building a pair-wise interactive model of the log-poly model that we got in the Methods section fitted on dataset.

## 
## Call:
## lm(formula = life_expectancy ~ (status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2))^2, 
##     data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.7606  -1.6682  -0.0832   1.4933  13.9105 
## 
## Coefficients: (1 not defined because of singularities)
##                                                                        Estimate
## (Intercept)                                                           4.687e+00
## statusDeveloping                                                      2.872e+01
## log1p(adult_mortality)                                                2.022e+00
## log1p(infant_deaths)                                                 -9.325e+00
## log1p(percentage_expenditure)                                         1.526e+00
## log1p(measles)                                                        6.442e-01
## log1p(bmi)                                                            2.073e+00
## log1p(under_five_deaths)                                              7.241e+00
## log1p(polio)                                                          4.773e+00
## diphtheria                                                            2.170e-02
## log1p(hiv_aids)                                                       9.528e-01
## log1p(gdp)                                                           -3.905e-01
## thinness_1_19_years                                                  -1.865e+00
## income_composition_of_resources                                       1.367e+02
## I(income_composition_of_resources^2)                                  6.209e+01
## schooling                                                            -2.438e+00
## I(schooling^2)                                                        3.215e-01
## statusDeveloping:log1p(adult_mortality)                              -3.693e-01
## statusDeveloping:log1p(infant_deaths)                                 5.124e-01
## statusDeveloping:log1p(percentage_expenditure)                       -1.458e-02
## statusDeveloping:log1p(measles)                                      -3.950e-01
## statusDeveloping:log1p(bmi)                                           3.458e-01
## statusDeveloping:log1p(under_five_deaths)                             5.748e-03
## statusDeveloping:log1p(polio)                                        -6.056e-01
## statusDeveloping:diphtheria                                          -1.935e-02
## statusDeveloping:log1p(hiv_aids)                                             NA
## statusDeveloping:log1p(gdp)                                           5.013e-02
## statusDeveloping:thinness_1_19_years                                  1.582e+00
## statusDeveloping:income_composition_of_resources                     -1.460e+02
## statusDeveloping:I(income_composition_of_resources^2)                 9.004e+01
## statusDeveloping:schooling                                            3.628e+00
## statusDeveloping:I(schooling^2)                                      -9.668e-02
## log1p(adult_mortality):log1p(infant_deaths)                           1.541e-01
## log1p(adult_mortality):log1p(percentage_expenditure)                  2.108e-02
## log1p(adult_mortality):log1p(measles)                                -2.151e-02
## log1p(adult_mortality):log1p(bmi)                                     1.063e-01
## log1p(adult_mortality):log1p(under_five_deaths)                      -2.622e-01
## log1p(adult_mortality):log1p(polio)                                  -3.196e-02
## log1p(adult_mortality):diphtheria                                     7.569e-04
## log1p(adult_mortality):log1p(hiv_aids)                                2.734e-01
## log1p(adult_mortality):log1p(gdp)                                    -1.232e-01
## log1p(adult_mortality):thinness_1_19_years                            5.446e-02
## log1p(adult_mortality):income_composition_of_resources               -2.089e+00
## log1p(adult_mortality):I(income_composition_of_resources^2)           4.024e+00
## log1p(adult_mortality):schooling                                     -2.226e-01
## log1p(adult_mortality):I(schooling^2)                                 3.533e-03
## log1p(infant_deaths):log1p(percentage_expenditure)                    3.552e-01
## log1p(infant_deaths):log1p(measles)                                   2.453e-01
## log1p(infant_deaths):log1p(bmi)                                       9.939e-02
## log1p(infant_deaths):log1p(under_five_deaths)                        -2.919e-02
## log1p(infant_deaths):log1p(polio)                                     2.505e+00
## log1p(infant_deaths):diphtheria                                      -9.217e-02
## log1p(infant_deaths):log1p(hiv_aids)                                  2.787e+00
## log1p(infant_deaths):log1p(gdp)                                      -8.342e-01
## log1p(infant_deaths):thinness_1_19_years                             -4.666e-02
## log1p(infant_deaths):income_composition_of_resources                 -1.194e+00
## log1p(infant_deaths):I(income_composition_of_resources^2)            -3.666e+00
## log1p(infant_deaths):schooling                                        2.423e+00
## log1p(infant_deaths):I(schooling^2)                                  -9.912e-02
## log1p(percentage_expenditure):log1p(measles)                         -5.631e-02
## log1p(percentage_expenditure):log1p(bmi)                              1.239e-01
## log1p(percentage_expenditure):log1p(under_five_deaths)               -2.771e-01
## log1p(percentage_expenditure):log1p(polio)                           -2.154e-01
## log1p(percentage_expenditure):diphtheria                             -2.570e-03
## log1p(percentage_expenditure):log1p(hiv_aids)                         3.463e-03
## log1p(percentage_expenditure):log1p(gdp)                             -1.202e-02
## log1p(percentage_expenditure):thinness_1_19_years                     4.391e-03
## log1p(percentage_expenditure):income_composition_of_resources        -5.156e-01
## log1p(percentage_expenditure):I(income_composition_of_resources^2)    9.760e-01
## log1p(percentage_expenditure):schooling                              -1.010e-01
## log1p(percentage_expenditure):I(schooling^2)                          2.877e-03
## log1p(measles):log1p(bmi)                                            -6.616e-02
## log1p(measles):log1p(under_five_deaths)                              -2.277e-01
## log1p(measles):log1p(polio)                                           3.269e-02
## log1p(measles):diphtheria                                            -8.626e-04
## log1p(measles):log1p(hiv_aids)                                        6.391e-02
## log1p(measles):log1p(gdp)                                             7.370e-02
## log1p(measles):thinness_1_19_years                                    2.433e-02
## log1p(measles):income_composition_of_resources                       -1.872e+00
## log1p(measles):I(income_composition_of_resources^2)                   2.247e+00
## log1p(measles):schooling                                             -2.732e-02
## log1p(measles):I(schooling^2)                                        -3.103e-04
## log1p(bmi):log1p(under_five_deaths)                                  -3.736e-02
## log1p(bmi):log1p(polio)                                               5.995e-02
## log1p(bmi):diphtheria                                                -2.322e-02
## log1p(bmi):log1p(hiv_aids)                                           -2.527e-01
## log1p(bmi):log1p(gdp)                                                -8.366e-02
## log1p(bmi):thinness_1_19_years                                       -1.637e-02
## log1p(bmi):income_composition_of_resources                           -1.560e+00
## log1p(bmi):I(income_composition_of_resources^2)                       2.569e+00
## log1p(bmi):schooling                                                 -5.365e-03
## log1p(bmi):I(schooling^2)                                            -6.066e-03
## log1p(under_five_deaths):log1p(polio)                                -2.604e+00
## log1p(under_five_deaths):diphtheria                                   9.389e-02
## log1p(under_five_deaths):log1p(hiv_aids)                             -2.599e+00
## log1p(under_five_deaths):log1p(gdp)                                   7.381e-01
## log1p(under_five_deaths):thinness_1_19_years                         -4.204e-03
## log1p(under_five_deaths):income_composition_of_resources              6.555e+00
## log1p(under_five_deaths):I(income_composition_of_resources^2)        -7.155e-01
## log1p(under_five_deaths):schooling                                   -2.191e+00
## log1p(under_five_deaths):I(schooling^2)                               8.924e-02
## log1p(polio):diphtheria                                               6.351e-03
## log1p(polio):log1p(hiv_aids)                                          3.251e-01
## log1p(polio):log1p(gdp)                                               1.444e-01
## log1p(polio):thinness_1_19_years                                      6.769e-03
## log1p(polio):income_composition_of_resources                         -1.546e+00
## log1p(polio):I(income_composition_of_resources^2)                     7.676e-01
## log1p(polio):schooling                                               -5.922e-01
## log1p(polio):I(schooling^2)                                           2.090e-02
## diphtheria:log1p(hiv_aids)                                           -2.197e-02
## diphtheria:log1p(gdp)                                                 8.345e-03
## diphtheria:thinness_1_19_years                                       -8.187e-04
## diphtheria:income_composition_of_resources                            7.047e-02
## diphtheria:I(income_composition_of_resources^2)                      -9.873e-02
## diphtheria:schooling                                                  8.612e-04
## diphtheria:I(schooling^2)                                            -1.215e-05
## log1p(hiv_aids):log1p(gdp)                                           -2.730e-01
## log1p(hiv_aids):thinness_1_19_years                                  -7.004e-02
## log1p(hiv_aids):income_composition_of_resources                      -9.252e+00
## log1p(hiv_aids):I(income_composition_of_resources^2)                  6.168e+00
## log1p(hiv_aids):schooling                                            -4.011e-01
## log1p(hiv_aids):I(schooling^2)                                        3.965e-02
## log1p(gdp):thinness_1_19_years                                        4.870e-03
## log1p(gdp):income_composition_of_resources                           -2.458e+00
## log1p(gdp):I(income_composition_of_resources^2)                       1.863e+00
## log1p(gdp):schooling                                                  9.957e-02
## log1p(gdp):I(schooling^2)                                            -4.867e-03
## thinness_1_19_years:income_composition_of_resources                   1.660e+00
## thinness_1_19_years:I(income_composition_of_resources^2)             -2.416e+00
## thinness_1_19_years:schooling                                        -3.226e-02
## thinness_1_19_years:I(schooling^2)                                    3.117e-03
## income_composition_of_resources:I(income_composition_of_resources^2) -1.807e+02
## income_composition_of_resources:schooling                             2.778e-01
## income_composition_of_resources:I(schooling^2)                       -4.927e-01
## I(income_composition_of_resources^2):schooling                        2.731e+00
## I(income_composition_of_resources^2):I(schooling^2)                   4.835e-01
## schooling:I(schooling^2)                                             -9.237e-03
##                                                                      Std. Error
## (Intercept)                                                           3.087e+01
## statusDeveloping                                                      2.981e+01
## log1p(adult_mortality)                                                1.057e+00
## log1p(infant_deaths)                                                  9.013e+00
## log1p(percentage_expenditure)                                         4.925e-01
## log1p(measles)                                                        4.141e-01
## log1p(bmi)                                                            1.327e+00
## log1p(under_five_deaths)                                              8.554e+00
## log1p(polio)                                                          1.961e+00
## diphtheria                                                            5.413e-02
## log1p(hiv_aids)                                                       1.871e+00
## log1p(gdp)                                                            7.485e-01
## thinness_1_19_years                                                   3.696e-01
## income_composition_of_resources                                       8.104e+01
## I(income_composition_of_resources^2)                                  6.116e+01
## schooling                                                             2.974e+00
## I(schooling^2)                                                        1.700e-01
## statusDeveloping:log1p(adult_mortality)                               2.488e-01
## statusDeveloping:log1p(infant_deaths)                                 1.507e+00
## statusDeveloping:log1p(percentage_expenditure)                        8.998e-02
## statusDeveloping:log1p(measles)                                       1.036e-01
## statusDeveloping:log1p(bmi)                                           3.371e-01
## statusDeveloping:log1p(under_five_deaths)                             1.439e+00
## statusDeveloping:log1p(polio)                                         1.121e+00
## statusDeveloping:diphtheria                                           2.401e-02
## statusDeveloping:log1p(hiv_aids)                                             NA
## statusDeveloping:log1p(gdp)                                           1.718e-01
## statusDeveloping:thinness_1_19_years                                  2.506e-01
## statusDeveloping:income_composition_of_resources                      7.577e+01
## statusDeveloping:I(income_composition_of_resources^2)                 4.609e+01
## statusDeveloping:schooling                                            2.073e+00
## statusDeveloping:I(schooling^2)                                       6.869e-02
## log1p(adult_mortality):log1p(infant_deaths)                           5.764e-01
## log1p(adult_mortality):log1p(percentage_expenditure)                  3.215e-02
## log1p(adult_mortality):log1p(measles)                                 2.757e-02
## log1p(adult_mortality):log1p(bmi)                                     1.058e-01
## log1p(adult_mortality):log1p(under_five_deaths)                       5.517e-01
## log1p(adult_mortality):log1p(polio)                                   1.507e-01
## log1p(adult_mortality):diphtheria                                     4.011e-03
## log1p(adult_mortality):log1p(hiv_aids)                                1.065e-01
## log1p(adult_mortality):log1p(gdp)                                     5.609e-02
## log1p(adult_mortality):thinness_1_19_years                            1.953e-02
## log1p(adult_mortality):income_composition_of_resources                1.887e+00
## log1p(adult_mortality):I(income_composition_of_resources^2)           2.179e+00
## log1p(adult_mortality):schooling                                      1.431e-01
## log1p(adult_mortality):I(schooling^2)                                 6.497e-03
## log1p(infant_deaths):log1p(percentage_expenditure)                    2.124e-01
## log1p(infant_deaths):log1p(measles)                                   2.095e-01
## log1p(infant_deaths):log1p(bmi)                                       7.344e-01
## log1p(infant_deaths):log1p(under_five_deaths)                         4.617e-02
## log1p(infant_deaths):log1p(polio)                                     1.721e+00
## log1p(infant_deaths):diphtheria                                       4.536e-02
## log1p(infant_deaths):log1p(hiv_aids)                                  1.151e+00
## log1p(infant_deaths):log1p(gdp)                                       3.815e-01
## log1p(infant_deaths):thinness_1_19_years                              1.668e-01
## log1p(infant_deaths):income_composition_of_resources                  1.741e+01
## log1p(infant_deaths):I(income_composition_of_resources^2)             1.699e+01
## log1p(infant_deaths):schooling                                        1.034e+00
## log1p(infant_deaths):I(schooling^2)                                   4.424e-02
## log1p(percentage_expenditure):log1p(measles)                          1.150e-02
## log1p(percentage_expenditure):log1p(bmi)                              4.043e-02
## log1p(percentage_expenditure):log1p(under_five_deaths)                2.041e-01
## log1p(percentage_expenditure):log1p(polio)                            6.401e-02
## log1p(percentage_expenditure):diphtheria                              1.621e-03
## log1p(percentage_expenditure):log1p(hiv_aids)                         6.224e-02
## log1p(percentage_expenditure):log1p(gdp)                              2.581e-02
## log1p(percentage_expenditure):thinness_1_19_years                     9.474e-03
## log1p(percentage_expenditure):income_composition_of_resources         7.771e-01
## log1p(percentage_expenditure):I(income_composition_of_resources^2)    8.813e-01
## log1p(percentage_expenditure):schooling                               5.619e-02
## log1p(percentage_expenditure):I(schooling^2)                          2.480e-03
## log1p(measles):log1p(bmi)                                             4.156e-02
## log1p(measles):log1p(under_five_deaths)                               2.022e-01
## log1p(measles):log1p(polio)                                           5.851e-02
## log1p(measles):diphtheria                                             1.476e-03
## log1p(measles):log1p(hiv_aids)                                        4.454e-02
## log1p(measles):log1p(gdp)                                             1.928e-02
## log1p(measles):thinness_1_19_years                                    8.494e-03
## log1p(measles):income_composition_of_resources                        8.315e-01
## log1p(measles):I(income_composition_of_resources^2)                   9.237e-01
## log1p(measles):schooling                                              5.558e-02
## log1p(measles):I(schooling^2)                                         2.595e-03
## log1p(bmi):log1p(under_five_deaths)                                   7.043e-01
## log1p(bmi):log1p(polio)                                               1.940e-01
## log1p(bmi):diphtheria                                                 5.523e-03
## log1p(bmi):log1p(hiv_aids)                                            1.882e-01
## log1p(bmi):log1p(gdp)                                                 7.040e-02
## log1p(bmi):thinness_1_19_years                                        3.330e-02
## log1p(bmi):income_composition_of_resources                            2.480e+00
## log1p(bmi):I(income_composition_of_resources^2)                       3.008e+00
## log1p(bmi):schooling                                                  1.603e-01
## log1p(bmi):I(schooling^2)                                             7.662e-03
## log1p(under_five_deaths):log1p(polio)                                 1.649e+00
## log1p(under_five_deaths):diphtheria                                   4.335e-02
## log1p(under_five_deaths):log1p(hiv_aids)                              1.110e+00
## log1p(under_five_deaths):log1p(gdp)                                   3.658e-01
## log1p(under_five_deaths):thinness_1_19_years                          1.633e-01
## log1p(under_five_deaths):income_composition_of_resources              1.629e+01
## log1p(under_five_deaths):I(income_composition_of_resources^2)         1.609e+01
## log1p(under_five_deaths):schooling                                    9.419e-01
## log1p(under_five_deaths):I(schooling^2)                               4.044e-02
## log1p(polio):diphtheria                                               3.712e-03
## log1p(polio):log1p(hiv_aids)                                          2.035e-01
## log1p(polio):log1p(gdp)                                               1.172e-01
## log1p(polio):thinness_1_19_years                                      3.905e-02
## log1p(polio):income_composition_of_resources                          3.322e+00
## log1p(polio):I(income_composition_of_resources^2)                     4.561e+00
## log1p(polio):schooling                                                3.696e-01
## log1p(polio):I(schooling^2)                                           1.855e-02
## diphtheria:log1p(hiv_aids)                                            6.180e-03
## diphtheria:log1p(gdp)                                                 2.911e-03
## diphtheria:thinness_1_19_years                                        1.119e-03
## diphtheria:income_composition_of_resources                            8.607e-02
## diphtheria:I(income_composition_of_resources^2)                       1.182e-01
## diphtheria:schooling                                                  8.690e-03
## diphtheria:I(schooling^2)                                             4.431e-04
## log1p(hiv_aids):log1p(gdp)                                            9.226e-02
## log1p(hiv_aids):thinness_1_19_years                                   3.130e-02
## log1p(hiv_aids):income_composition_of_resources                       4.847e+00
## log1p(hiv_aids):I(income_composition_of_resources^2)                  6.064e+00
## log1p(hiv_aids):schooling                                             3.515e-01
## log1p(hiv_aids):I(schooling^2)                                        2.026e-02
## log1p(gdp):thinness_1_19_years                                        1.363e-02
## log1p(gdp):income_composition_of_resources                            1.313e+00
## log1p(gdp):I(income_composition_of_resources^2)                       1.569e+00
## log1p(gdp):schooling                                                  9.065e-02
## log1p(gdp):I(schooling^2)                                             4.261e-03
## thinness_1_19_years:income_composition_of_resources                   5.019e-01
## thinness_1_19_years:I(income_composition_of_resources^2)              7.237e-01
## thinness_1_19_years:schooling                                         5.040e-02
## thinness_1_19_years:I(schooling^2)                                    2.914e-03
## income_composition_of_resources:I(income_composition_of_resources^2)  3.434e+01
## income_composition_of_resources:schooling                             2.124e+00
## income_composition_of_resources:I(schooling^2)                        1.196e-01
## I(income_composition_of_resources^2):schooling                        4.259e+00
## I(income_composition_of_resources^2):I(schooling^2)                   1.357e-01
## schooling:I(schooling^2)                                              4.278e-03
##                                                                      t value
## (Intercept)                                                            0.152
## statusDeveloping                                                       0.963
## log1p(adult_mortality)                                                 1.914
## log1p(infant_deaths)                                                  -1.035
## log1p(percentage_expenditure)                                          3.098
## log1p(measles)                                                         1.556
## log1p(bmi)                                                             1.562
## log1p(under_five_deaths)                                               0.847
## log1p(polio)                                                           2.434
## diphtheria                                                             0.401
## log1p(hiv_aids)                                                        0.509
## log1p(gdp)                                                            -0.522
## thinness_1_19_years                                                   -5.045
## income_composition_of_resources                                        1.687
## I(income_composition_of_resources^2)                                   1.015
## schooling                                                             -0.820
## I(schooling^2)                                                         1.891
## statusDeveloping:log1p(adult_mortality)                               -1.485
## statusDeveloping:log1p(infant_deaths)                                  0.340
## statusDeveloping:log1p(percentage_expenditure)                        -0.162
## statusDeveloping:log1p(measles)                                       -3.813
## statusDeveloping:log1p(bmi)                                            1.026
## statusDeveloping:log1p(under_five_deaths)                              0.004
## statusDeveloping:log1p(polio)                                         -0.540
## statusDeveloping:diphtheria                                           -0.806
## statusDeveloping:log1p(hiv_aids)                                          NA
## statusDeveloping:log1p(gdp)                                            0.292
## statusDeveloping:thinness_1_19_years                                   6.312
## statusDeveloping:income_composition_of_resources                      -1.927
## statusDeveloping:I(income_composition_of_resources^2)                  1.954
## statusDeveloping:schooling                                             1.750
## statusDeveloping:I(schooling^2)                                       -1.408
## log1p(adult_mortality):log1p(infant_deaths)                            0.267
## log1p(adult_mortality):log1p(percentage_expenditure)                   0.656
## log1p(adult_mortality):log1p(measles)                                 -0.780
## log1p(adult_mortality):log1p(bmi)                                      1.004
## log1p(adult_mortality):log1p(under_five_deaths)                       -0.475
## log1p(adult_mortality):log1p(polio)                                   -0.212
## log1p(adult_mortality):diphtheria                                      0.189
## log1p(adult_mortality):log1p(hiv_aids)                                 2.568
## log1p(adult_mortality):log1p(gdp)                                     -2.197
## log1p(adult_mortality):thinness_1_19_years                             2.789
## log1p(adult_mortality):income_composition_of_resources                -1.107
## log1p(adult_mortality):I(income_composition_of_resources^2)            1.847
## log1p(adult_mortality):schooling                                      -1.556
## log1p(adult_mortality):I(schooling^2)                                  0.544
## log1p(infant_deaths):log1p(percentage_expenditure)                     1.672
## log1p(infant_deaths):log1p(measles)                                    1.171
## log1p(infant_deaths):log1p(bmi)                                        0.135
## log1p(infant_deaths):log1p(under_five_deaths)                         -0.632
## log1p(infant_deaths):log1p(polio)                                      1.455
## log1p(infant_deaths):diphtheria                                       -2.032
## log1p(infant_deaths):log1p(hiv_aids)                                   2.422
## log1p(infant_deaths):log1p(gdp)                                       -2.187
## log1p(infant_deaths):thinness_1_19_years                              -0.280
## log1p(infant_deaths):income_composition_of_resources                  -0.069
## log1p(infant_deaths):I(income_composition_of_resources^2)             -0.216
## log1p(infant_deaths):schooling                                         2.343
## log1p(infant_deaths):I(schooling^2)                                   -2.240
## log1p(percentage_expenditure):log1p(measles)                          -4.899
## log1p(percentage_expenditure):log1p(bmi)                               3.065
## log1p(percentage_expenditure):log1p(under_five_deaths)                -1.358
## log1p(percentage_expenditure):log1p(polio)                            -3.365
## log1p(percentage_expenditure):diphtheria                              -1.585
## log1p(percentage_expenditure):log1p(hiv_aids)                          0.056
## log1p(percentage_expenditure):log1p(gdp)                              -0.466
## log1p(percentage_expenditure):thinness_1_19_years                      0.463
## log1p(percentage_expenditure):income_composition_of_resources         -0.663
## log1p(percentage_expenditure):I(income_composition_of_resources^2)     1.108
## log1p(percentage_expenditure):schooling                               -1.797
## log1p(percentage_expenditure):I(schooling^2)                           1.160
## log1p(measles):log1p(bmi)                                             -1.592
## log1p(measles):log1p(under_five_deaths)                               -1.126
## log1p(measles):log1p(polio)                                            0.559
## log1p(measles):diphtheria                                             -0.584
## log1p(measles):log1p(hiv_aids)                                         1.435
## log1p(measles):log1p(gdp)                                              3.823
## log1p(measles):thinness_1_19_years                                     2.865
## log1p(measles):income_composition_of_resources                        -2.251
## log1p(measles):I(income_composition_of_resources^2)                    2.432
## log1p(measles):schooling                                              -0.492
## log1p(measles):I(schooling^2)                                         -0.120
## log1p(bmi):log1p(under_five_deaths)                                   -0.053
## log1p(bmi):log1p(polio)                                                0.309
## log1p(bmi):diphtheria                                                 -4.204
## log1p(bmi):log1p(hiv_aids)                                            -1.342
## log1p(bmi):log1p(gdp)                                                 -1.188
## log1p(bmi):thinness_1_19_years                                        -0.491
## log1p(bmi):income_composition_of_resources                            -0.629
## log1p(bmi):I(income_composition_of_resources^2)                        0.854
## log1p(bmi):schooling                                                  -0.033
## log1p(bmi):I(schooling^2)                                             -0.792
## log1p(under_five_deaths):log1p(polio)                                 -1.579
## log1p(under_five_deaths):diphtheria                                    2.166
## log1p(under_five_deaths):log1p(hiv_aids)                              -2.342
## log1p(under_five_deaths):log1p(gdp)                                    2.018
## log1p(under_five_deaths):thinness_1_19_years                          -0.026
## log1p(under_five_deaths):income_composition_of_resources               0.402
## log1p(under_five_deaths):I(income_composition_of_resources^2)         -0.044
## log1p(under_five_deaths):schooling                                    -2.326
## log1p(under_five_deaths):I(schooling^2)                                2.207
## log1p(polio):diphtheria                                                1.711
## log1p(polio):log1p(hiv_aids)                                           1.597
## log1p(polio):log1p(gdp)                                                1.233
## log1p(polio):thinness_1_19_years                                       0.173
## log1p(polio):income_composition_of_resources                          -0.465
## log1p(polio):I(income_composition_of_resources^2)                      0.168
## log1p(polio):schooling                                                -1.602
## log1p(polio):I(schooling^2)                                            1.127
## diphtheria:log1p(hiv_aids)                                            -3.555
## diphtheria:log1p(gdp)                                                  2.867
## diphtheria:thinness_1_19_years                                        -0.732
## diphtheria:income_composition_of_resources                             0.819
## diphtheria:I(income_composition_of_resources^2)                       -0.836
## diphtheria:schooling                                                   0.099
## diphtheria:I(schooling^2)                                             -0.027
## log1p(hiv_aids):log1p(gdp)                                            -2.959
## log1p(hiv_aids):thinness_1_19_years                                   -2.238
## log1p(hiv_aids):income_composition_of_resources                       -1.909
## log1p(hiv_aids):I(income_composition_of_resources^2)                   1.017
## log1p(hiv_aids):schooling                                             -1.141
## log1p(hiv_aids):I(schooling^2)                                         1.957
## log1p(gdp):thinness_1_19_years                                         0.357
## log1p(gdp):income_composition_of_resources                            -1.872
## log1p(gdp):I(income_composition_of_resources^2)                        1.187
## log1p(gdp):schooling                                                   1.098
## log1p(gdp):I(schooling^2)                                             -1.142
## thinness_1_19_years:income_composition_of_resources                    3.308
## thinness_1_19_years:I(income_composition_of_resources^2)              -3.338
## thinness_1_19_years:schooling                                         -0.640
## thinness_1_19_years:I(schooling^2)                                     1.070
## income_composition_of_resources:I(income_composition_of_resources^2)  -5.262
## income_composition_of_resources:schooling                              0.131
## income_composition_of_resources:I(schooling^2)                        -4.119
## I(income_composition_of_resources^2):schooling                         0.641
## I(income_composition_of_resources^2):I(schooling^2)                    3.562
## schooling:I(schooling^2)                                              -2.159
##                                                                      Pr(>|t|)
## (Intercept)                                                          0.879329
## statusDeveloping                                                     0.335403
## log1p(adult_mortality)                                               0.055785
## log1p(infant_deaths)                                                 0.300976
## log1p(percentage_expenditure)                                        0.001972
## log1p(measles)                                                       0.119853
## log1p(bmi)                                                           0.118419
## log1p(under_five_deaths)                                             0.397347
## log1p(polio)                                                         0.014996
## diphtheria                                                           0.688601
## log1p(hiv_aids)                                                      0.610653
## log1p(gdp)                                                           0.601881
## thinness_1_19_years                                                  4.87e-07
## income_composition_of_resources                                      0.091738
## I(income_composition_of_resources^2)                                 0.310162
## schooling                                                            0.412522
## I(schooling^2)                                                       0.058794
## statusDeveloping:log1p(adult_mortality)                              0.137773
## statusDeveloping:log1p(infant_deaths)                                0.733943
## statusDeveloping:log1p(percentage_expenditure)                       0.871252
## statusDeveloping:log1p(measles)                                      0.000141
## statusDeveloping:log1p(bmi)                                          0.305069
## statusDeveloping:log1p(under_five_deaths)                            0.996814
## statusDeveloping:log1p(polio)                                        0.589225
## statusDeveloping:diphtheria                                          0.420289
## statusDeveloping:log1p(hiv_aids)                                           NA
## statusDeveloping:log1p(gdp)                                          0.770399
## statusDeveloping:thinness_1_19_years                                 3.25e-10
## statusDeveloping:income_composition_of_resources                     0.054136
## statusDeveloping:I(income_composition_of_resources^2)                0.050854
## statusDeveloping:schooling                                           0.080253
## statusDeveloping:I(schooling^2)                                      0.159393
## log1p(adult_mortality):log1p(infant_deaths)                          0.789272
## log1p(adult_mortality):log1p(percentage_expenditure)                 0.512096
## log1p(adult_mortality):log1p(measles)                                0.435373
## log1p(adult_mortality):log1p(bmi)                                    0.315268
## log1p(adult_mortality):log1p(under_five_deaths)                      0.634619
## log1p(adult_mortality):log1p(polio)                                  0.832018
## log1p(adult_mortality):diphtheria                                    0.850343
## log1p(adult_mortality):log1p(hiv_aids)                               0.010285
## log1p(adult_mortality):log1p(gdp)                                    0.028086
## log1p(adult_mortality):thinness_1_19_years                           0.005330
## log1p(adult_mortality):income_composition_of_resources               0.268465
## log1p(adult_mortality):I(income_composition_of_resources^2)          0.064935
## log1p(adult_mortality):schooling                                     0.119845
## log1p(adult_mortality):I(schooling^2)                                0.586645
## log1p(infant_deaths):log1p(percentage_expenditure)                   0.094605
## log1p(infant_deaths):log1p(measles)                                  0.241757
## log1p(infant_deaths):log1p(bmi)                                      0.892351
## log1p(infant_deaths):log1p(under_five_deaths)                        0.527359
## log1p(infant_deaths):log1p(polio)                                    0.145744
## log1p(infant_deaths):diphtheria                                      0.042242
## log1p(infant_deaths):log1p(hiv_aids)                                 0.015523
## log1p(infant_deaths):log1p(gdp)                                      0.028853
## log1p(infant_deaths):thinness_1_19_years                             0.779683
## log1p(infant_deaths):income_composition_of_resources                 0.945342
## log1p(infant_deaths):I(income_composition_of_resources^2)            0.829117
## log1p(infant_deaths):schooling                                       0.019190
## log1p(infant_deaths):I(schooling^2)                                  0.025149
## log1p(percentage_expenditure):log1p(measles)                         1.03e-06
## log1p(percentage_expenditure):log1p(bmi)                             0.002199
## log1p(percentage_expenditure):log1p(under_five_deaths)               0.174627
## log1p(percentage_expenditure):log1p(polio)                           0.000778
## log1p(percentage_expenditure):diphtheria                             0.113121
## log1p(percentage_expenditure):log1p(hiv_aids)                        0.955633
## log1p(percentage_expenditure):log1p(gdp)                             0.641407
## log1p(percentage_expenditure):thinness_1_19_years                    0.643074
## log1p(percentage_expenditure):income_composition_of_resources        0.507074
## log1p(percentage_expenditure):I(income_composition_of_resources^2)   0.268180
## log1p(percentage_expenditure):schooling                              0.072414
## log1p(percentage_expenditure):I(schooling^2)                         0.246134
## log1p(measles):log1p(bmi)                                            0.111583
## log1p(measles):log1p(under_five_deaths)                              0.260155
## log1p(measles):log1p(polio)                                          0.576428
## log1p(measles):diphtheria                                            0.559101
## log1p(measles):log1p(hiv_aids)                                       0.151484
## log1p(measles):log1p(gdp)                                            0.000135
## log1p(measles):thinness_1_19_years                                   0.004207
## log1p(measles):income_composition_of_resources                       0.024474
## log1p(measles):I(income_composition_of_resources^2)                  0.015071
## log1p(measles):schooling                                             0.623067
## log1p(measles):I(schooling^2)                                        0.904823
## log1p(bmi):log1p(under_five_deaths)                                  0.957702
## log1p(bmi):log1p(polio)                                              0.757393
## log1p(bmi):diphtheria                                                2.71e-05
## log1p(bmi):log1p(hiv_aids)                                           0.179631
## log1p(bmi):log1p(gdp)                                                0.234794
## log1p(bmi):thinness_1_19_years                                       0.623139
## log1p(bmi):income_composition_of_resources                           0.529430
## log1p(bmi):I(income_composition_of_resources^2)                      0.393142
## log1p(bmi):schooling                                                 0.973302
## log1p(bmi):I(schooling^2)                                            0.428599
## log1p(under_five_deaths):log1p(polio)                                0.114392
## log1p(under_five_deaths):diphtheria                                  0.030392
## log1p(under_five_deaths):log1p(hiv_aids)                             0.019278
## log1p(under_five_deaths):log1p(gdp)                                  0.043712
## log1p(under_five_deaths):thinness_1_19_years                         0.979464
## log1p(under_five_deaths):income_composition_of_resources             0.687403
## log1p(under_five_deaths):I(income_composition_of_resources^2)        0.964531
## log1p(under_five_deaths):schooling                                   0.020105
## log1p(under_five_deaths):I(schooling^2)                              0.027412
## log1p(polio):diphtheria                                              0.087262
## log1p(polio):log1p(hiv_aids)                                         0.110386
## log1p(polio):log1p(gdp)                                              0.217716
## log1p(polio):thinness_1_19_years                                     0.862415
## log1p(polio):income_composition_of_resources                         0.641745
## log1p(polio):I(income_composition_of_resources^2)                    0.866357
## log1p(polio):schooling                                               0.109215
## log1p(polio):I(schooling^2)                                          0.260057
## diphtheria:log1p(hiv_aids)                                           0.000385
## diphtheria:log1p(gdp)                                                0.004185
## diphtheria:thinness_1_19_years                                       0.464504
## diphtheria:income_composition_of_resources                           0.412986
## diphtheria:I(income_composition_of_resources^2)                      0.403482
## diphtheria:schooling                                                 0.921060
## diphtheria:I(schooling^2)                                            0.978124
## log1p(hiv_aids):log1p(gdp)                                           0.003113
## log1p(hiv_aids):thinness_1_19_years                                  0.025331
## log1p(hiv_aids):income_composition_of_resources                      0.056409
## log1p(hiv_aids):I(income_composition_of_resources^2)                 0.309194
## log1p(hiv_aids):schooling                                            0.254007
## log1p(hiv_aids):I(schooling^2)                                       0.050482
## log1p(gdp):thinness_1_19_years                                       0.720981
## log1p(gdp):income_composition_of_resources                           0.061310
## log1p(gdp):I(income_composition_of_resources^2)                      0.235266
## log1p(gdp):schooling                                                 0.272155
## log1p(gdp):I(schooling^2)                                            0.253411
## thinness_1_19_years:income_composition_of_resources                  0.000952
## thinness_1_19_years:I(income_composition_of_resources^2)             0.000856
## thinness_1_19_years:schooling                                        0.522213
## thinness_1_19_years:I(schooling^2)                                   0.284839
## income_composition_of_resources:I(income_composition_of_resources^2) 1.54e-07
## income_composition_of_resources:schooling                            0.895920
## income_composition_of_resources:I(schooling^2)                       3.94e-05
## I(income_composition_of_resources^2):schooling                       0.521373
## I(income_composition_of_resources^2):I(schooling^2)                  0.000374
## schooling:I(schooling^2)                                             0.030920
##                                                                         
## (Intercept)                                                             
## statusDeveloping                                                        
## log1p(adult_mortality)                                               .  
## log1p(infant_deaths)                                                    
## log1p(percentage_expenditure)                                        ** 
## log1p(measles)                                                          
## log1p(bmi)                                                              
## log1p(under_five_deaths)                                                
## log1p(polio)                                                         *  
## diphtheria                                                              
## log1p(hiv_aids)                                                         
## log1p(gdp)                                                              
## thinness_1_19_years                                                  ***
## income_composition_of_resources                                      .  
## I(income_composition_of_resources^2)                                    
## schooling                                                               
## I(schooling^2)                                                       .  
## statusDeveloping:log1p(adult_mortality)                                 
## statusDeveloping:log1p(infant_deaths)                                   
## statusDeveloping:log1p(percentage_expenditure)                          
## statusDeveloping:log1p(measles)                                      ***
## statusDeveloping:log1p(bmi)                                             
## statusDeveloping:log1p(under_five_deaths)                               
## statusDeveloping:log1p(polio)                                           
## statusDeveloping:diphtheria                                             
## statusDeveloping:log1p(hiv_aids)                                        
## statusDeveloping:log1p(gdp)                                             
## statusDeveloping:thinness_1_19_years                                 ***
## statusDeveloping:income_composition_of_resources                     .  
## statusDeveloping:I(income_composition_of_resources^2)                .  
## statusDeveloping:schooling                                           .  
## statusDeveloping:I(schooling^2)                                         
## log1p(adult_mortality):log1p(infant_deaths)                             
## log1p(adult_mortality):log1p(percentage_expenditure)                    
## log1p(adult_mortality):log1p(measles)                                   
## log1p(adult_mortality):log1p(bmi)                                       
## log1p(adult_mortality):log1p(under_five_deaths)                         
## log1p(adult_mortality):log1p(polio)                                     
## log1p(adult_mortality):diphtheria                                       
## log1p(adult_mortality):log1p(hiv_aids)                               *  
## log1p(adult_mortality):log1p(gdp)                                    *  
## log1p(adult_mortality):thinness_1_19_years                           ** 
## log1p(adult_mortality):income_composition_of_resources                  
## log1p(adult_mortality):I(income_composition_of_resources^2)          .  
## log1p(adult_mortality):schooling                                        
## log1p(adult_mortality):I(schooling^2)                                   
## log1p(infant_deaths):log1p(percentage_expenditure)                   .  
## log1p(infant_deaths):log1p(measles)                                     
## log1p(infant_deaths):log1p(bmi)                                         
## log1p(infant_deaths):log1p(under_five_deaths)                           
## log1p(infant_deaths):log1p(polio)                                       
## log1p(infant_deaths):diphtheria                                      *  
## log1p(infant_deaths):log1p(hiv_aids)                                 *  
## log1p(infant_deaths):log1p(gdp)                                      *  
## log1p(infant_deaths):thinness_1_19_years                                
## log1p(infant_deaths):income_composition_of_resources                    
## log1p(infant_deaths):I(income_composition_of_resources^2)               
## log1p(infant_deaths):schooling                                       *  
## log1p(infant_deaths):I(schooling^2)                                  *  
## log1p(percentage_expenditure):log1p(measles)                         ***
## log1p(percentage_expenditure):log1p(bmi)                             ** 
## log1p(percentage_expenditure):log1p(under_five_deaths)                  
## log1p(percentage_expenditure):log1p(polio)                           ***
## log1p(percentage_expenditure):diphtheria                                
## log1p(percentage_expenditure):log1p(hiv_aids)                           
## log1p(percentage_expenditure):log1p(gdp)                                
## log1p(percentage_expenditure):thinness_1_19_years                       
## log1p(percentage_expenditure):income_composition_of_resources           
## log1p(percentage_expenditure):I(income_composition_of_resources^2)      
## log1p(percentage_expenditure):schooling                              .  
## log1p(percentage_expenditure):I(schooling^2)                            
## log1p(measles):log1p(bmi)                                               
## log1p(measles):log1p(under_five_deaths)                                 
## log1p(measles):log1p(polio)                                             
## log1p(measles):diphtheria                                               
## log1p(measles):log1p(hiv_aids)                                          
## log1p(measles):log1p(gdp)                                            ***
## log1p(measles):thinness_1_19_years                                   ** 
## log1p(measles):income_composition_of_resources                       *  
## log1p(measles):I(income_composition_of_resources^2)                  *  
## log1p(measles):schooling                                                
## log1p(measles):I(schooling^2)                                           
## log1p(bmi):log1p(under_five_deaths)                                     
## log1p(bmi):log1p(polio)                                                 
## log1p(bmi):diphtheria                                                ***
## log1p(bmi):log1p(hiv_aids)                                              
## log1p(bmi):log1p(gdp)                                                   
## log1p(bmi):thinness_1_19_years                                          
## log1p(bmi):income_composition_of_resources                              
## log1p(bmi):I(income_composition_of_resources^2)                         
## log1p(bmi):schooling                                                    
## log1p(bmi):I(schooling^2)                                               
## log1p(under_five_deaths):log1p(polio)                                   
## log1p(under_five_deaths):diphtheria                                  *  
## log1p(under_five_deaths):log1p(hiv_aids)                             *  
## log1p(under_five_deaths):log1p(gdp)                                  *  
## log1p(under_five_deaths):thinness_1_19_years                            
## log1p(under_five_deaths):income_composition_of_resources                
## log1p(under_five_deaths):I(income_composition_of_resources^2)           
## log1p(under_five_deaths):schooling                                   *  
## log1p(under_five_deaths):I(schooling^2)                              *  
## log1p(polio):diphtheria                                              .  
## log1p(polio):log1p(hiv_aids)                                            
## log1p(polio):log1p(gdp)                                                 
## log1p(polio):thinness_1_19_years                                        
## log1p(polio):income_composition_of_resources                            
## log1p(polio):I(income_composition_of_resources^2)                       
## log1p(polio):schooling                                                  
## log1p(polio):I(schooling^2)                                             
## diphtheria:log1p(hiv_aids)                                           ***
## diphtheria:log1p(gdp)                                                ** 
## diphtheria:thinness_1_19_years                                          
## diphtheria:income_composition_of_resources                              
## diphtheria:I(income_composition_of_resources^2)                         
## diphtheria:schooling                                                    
## diphtheria:I(schooling^2)                                               
## log1p(hiv_aids):log1p(gdp)                                           ** 
## log1p(hiv_aids):thinness_1_19_years                                  *  
## log1p(hiv_aids):income_composition_of_resources                      .  
## log1p(hiv_aids):I(income_composition_of_resources^2)                    
## log1p(hiv_aids):schooling                                               
## log1p(hiv_aids):I(schooling^2)                                       .  
## log1p(gdp):thinness_1_19_years                                          
## log1p(gdp):income_composition_of_resources                           .  
## log1p(gdp):I(income_composition_of_resources^2)                         
## log1p(gdp):schooling                                                    
## log1p(gdp):I(schooling^2)                                               
## thinness_1_19_years:income_composition_of_resources                  ***
## thinness_1_19_years:I(income_composition_of_resources^2)             ***
## thinness_1_19_years:schooling                                           
## thinness_1_19_years:I(schooling^2)                                      
## income_composition_of_resources:I(income_composition_of_resources^2) ***
## income_composition_of_resources:schooling                               
## income_composition_of_resources:I(schooling^2)                       ***
## I(income_composition_of_resources^2):schooling                          
## I(income_composition_of_resources^2):I(schooling^2)                  ***
## schooling:I(schooling^2)                                             *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.014 on 2499 degrees of freedom
## Multiple R-squared:  0.9047, Adjusted R-squared:  0.8996 
## F-statistic: 175.8 on 135 and 2499 DF,  p-value: < 2.2e-16

## Warning in predict.lm(aic_back_full_additive_model_log_poly_interactive, :
## prediction from a rank-deficient fit may be misleading
## [1] 3.006109

Stepwise search using pair-wise interactive model based on final model that we have choosen (aic_back_full_additive_model_log_poly) as initial model (disabled, if you want to see please enable it from the RMD file)

Based on the above search, we fitted the aic model (we get above)

aic_back_log_poly <- lm(life_expectancy ~ status + 
                                 log1p(adult_mortality) + log1p(infant_deaths) + 
    log1p(percentage_expenditure) + log1p(measles) + log1p(bmi) +
    log1p(under_five_deaths) + log1p(polio) + diphtheria + log1p(hiv_aids) +
    log1p(gdp) + thinness_1_19_years + income_composition_of_resources +
    I(income_composition_of_resources^2) + schooling + I(schooling^2) +
    status:log1p(adult_mortality) + status:log1p(infant_deaths) +
    status:log1p(measles) + status:thinness_1_19_years + status:income_composition_of_resources +
    status:I(income_composition_of_resources^2) + status:schooling +
    status:I(schooling^2) + log1p(adult_mortality):log1p(measles) +
    log1p(adult_mortality):log1p(hiv_aids) + log1p(adult_mortality):log1p(gdp) +
    log1p(adult_mortality):thinness_1_19_years + log1p(adult_mortality):income_composition_of_resources +
    log1p(adult_mortality):I(income_composition_of_resources^2) +
    log1p(adult_mortality):schooling + log1p(infant_deaths):log1p(percentage_expenditure) +
    log1p(infant_deaths):log1p(measles) + log1p(infant_deaths):log1p(hiv_aids) +
    log1p(infant_deaths):log1p(gdp) + log1p(infant_deaths):schooling +
    log1p(infant_deaths):I(schooling^2) + log1p(percentage_expenditure):log1p(measles) +
    log1p(percentage_expenditure):log1p(bmi) + log1p(percentage_expenditure):log1p(polio) +
    log1p(percentage_expenditure):diphtheria + log1p(percentage_expenditure):schooling +
    log1p(percentage_expenditure):I(schooling^2) + log1p(measles):log1p(bmi) +
    log1p(measles):log1p(under_five_deaths) + log1p(measles):log1p(gdp) +
    log1p(measles):thinness_1_19_years + log1p(measles):income_composition_of_resources +
    log1p(measles):I(income_composition_of_resources^2) + log1p(bmi):diphtheria +
    log1p(bmi):log1p(hiv_aids) + log1p(bmi):I(schooling^2) +
    log1p(under_five_deaths):log1p(polio) + log1p(under_five_deaths):diphtheria +
    log1p(under_five_deaths):log1p(hiv_aids) + log1p(under_five_deaths):log1p(gdp) +
    log1p(under_five_deaths):thinness_1_19_years + log1p(under_five_deaths):income_composition_of_resources +
    log1p(under_five_deaths):schooling + log1p(under_five_deaths):I(schooling^2) +
    log1p(polio):schooling + log1p(polio):I(schooling^2) + diphtheria:log1p(hiv_aids) +
    diphtheria:log1p(gdp) + log1p(hiv_aids):log1p(gdp) + log1p(hiv_aids):thinness_1_19_years +
    log1p(hiv_aids):income_composition_of_resources + log1p(hiv_aids):I(income_composition_of_resources^2) +
    log1p(hiv_aids):I(schooling^2) + log1p(gdp):income_composition_of_resources +
    thinness_1_19_years:income_composition_of_resources + thinness_1_19_years:I(income_composition_of_resources^2) +
    income_composition_of_resources:I(income_composition_of_resources^2) +
    income_composition_of_resources:I(schooling^2) + I(income_composition_of_resources^2):I(schooling^2) +
    schooling:I(schooling^2), data = non_cat_predictor_df)

Summary:

## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2) + 
##     status:log1p(adult_mortality) + status:log1p(infant_deaths) + 
##     status:log1p(measles) + status:thinness_1_19_years + status:income_composition_of_resources + 
##     status:I(income_composition_of_resources^2) + status:schooling + 
##     status:I(schooling^2) + log1p(adult_mortality):log1p(measles) + 
##     log1p(adult_mortality):log1p(hiv_aids) + log1p(adult_mortality):log1p(gdp) + 
##     log1p(adult_mortality):thinness_1_19_years + log1p(adult_mortality):income_composition_of_resources + 
##     log1p(adult_mortality):I(income_composition_of_resources^2) + 
##     log1p(adult_mortality):schooling + log1p(infant_deaths):log1p(percentage_expenditure) + 
##     log1p(infant_deaths):log1p(measles) + log1p(infant_deaths):log1p(hiv_aids) + 
##     log1p(infant_deaths):log1p(gdp) + log1p(infant_deaths):schooling + 
##     log1p(infant_deaths):I(schooling^2) + log1p(percentage_expenditure):log1p(measles) + 
##     log1p(percentage_expenditure):log1p(bmi) + log1p(percentage_expenditure):log1p(polio) + 
##     log1p(percentage_expenditure):diphtheria + log1p(percentage_expenditure):schooling + 
##     log1p(percentage_expenditure):I(schooling^2) + log1p(measles):log1p(bmi) + 
##     log1p(measles):log1p(under_five_deaths) + log1p(measles):log1p(gdp) + 
##     log1p(measles):thinness_1_19_years + log1p(measles):income_composition_of_resources + 
##     log1p(measles):I(income_composition_of_resources^2) + log1p(bmi):diphtheria + 
##     log1p(bmi):log1p(hiv_aids) + log1p(bmi):I(schooling^2) + 
##     log1p(under_five_deaths):log1p(polio) + log1p(under_five_deaths):diphtheria + 
##     log1p(under_five_deaths):log1p(hiv_aids) + log1p(under_five_deaths):log1p(gdp) + 
##     log1p(under_five_deaths):thinness_1_19_years + log1p(under_five_deaths):income_composition_of_resources + 
##     log1p(under_five_deaths):schooling + log1p(under_five_deaths):I(schooling^2) + 
##     log1p(polio):schooling + log1p(polio):I(schooling^2) + diphtheria:log1p(hiv_aids) + 
##     diphtheria:log1p(gdp) + log1p(hiv_aids):log1p(gdp) + log1p(hiv_aids):thinness_1_19_years + 
##     log1p(hiv_aids):income_composition_of_resources + log1p(hiv_aids):I(income_composition_of_resources^2) + 
##     log1p(hiv_aids):I(schooling^2) + log1p(gdp):income_composition_of_resources + 
##     thinness_1_19_years:income_composition_of_resources + thinness_1_19_years:I(income_composition_of_resources^2) + 
##     income_composition_of_resources:I(income_composition_of_resources^2) + 
##     income_composition_of_resources:I(schooling^2) + I(income_composition_of_resources^2):I(schooling^2) + 
##     schooling:I(schooling^2), data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.9751  -1.6591  -0.0355   1.5194  13.8922 
## 
## Coefficients:
##                                                                        Estimate
## (Intercept)                                                           9.639e+00
## statusDeveloping                                                      2.296e+01
## log1p(adult_mortality)                                                1.967e+00
## log1p(infant_deaths)                                                 -4.694e+00
## log1p(percentage_expenditure)                                         1.183e+00
## log1p(measles)                                                        7.240e-01
## log1p(bmi)                                                            2.619e+00
## log1p(under_five_deaths)                                              2.320e+00
## log1p(polio)                                                          4.342e+00
## diphtheria                                                            4.783e-02
## log1p(hiv_aids)                                                       3.016e-01
## log1p(gdp)                                                            2.965e-01
## thinness_1_19_years                                                  -1.807e+00
## income_composition_of_resources                                       1.425e+02
## I(income_composition_of_resources^2)                                  2.880e+01
## schooling                                                            -3.531e+00
## I(schooling^2)                                                        4.202e-01
## statusDeveloping:log1p(adult_mortality)                              -3.824e-01
## statusDeveloping:log1p(infant_deaths)                                 9.245e-01
## statusDeveloping:log1p(measles)                                      -4.578e-01
## statusDeveloping:thinness_1_19_years                                  1.474e+00
## statusDeveloping:income_composition_of_resources                     -1.387e+02
## statusDeveloping:I(income_composition_of_resources^2)                 8.349e+01
## statusDeveloping:schooling                                            3.641e+00
## statusDeveloping:I(schooling^2)                                      -9.142e-02
## log1p(adult_mortality):log1p(measles)                                -5.218e-02
## log1p(adult_mortality):log1p(hiv_aids)                                2.598e-01
## log1p(adult_mortality):log1p(gdp)                                    -1.176e-01
## log1p(adult_mortality):thinness_1_19_years                            5.159e-02
## log1p(adult_mortality):income_composition_of_resources               -4.385e+00
## log1p(adult_mortality):I(income_composition_of_resources^2)           6.386e+00
## log1p(adult_mortality):schooling                                     -1.167e-01
## log1p(infant_deaths):log1p(percentage_expenditure)                    5.578e-02
## log1p(infant_deaths):log1p(measles)                                   1.739e-01
## log1p(infant_deaths):log1p(hiv_aids)                                  3.296e+00
## log1p(infant_deaths):log1p(gdp)                                      -6.629e-01
## log1p(infant_deaths):schooling                                        1.956e+00
## log1p(infant_deaths):I(schooling^2)                                  -8.841e-02
## log1p(percentage_expenditure):log1p(measles)                         -5.049e-02
## log1p(percentage_expenditure):log1p(bmi)                              1.314e-01
## log1p(percentage_expenditure):log1p(polio)                           -1.767e-01
## log1p(percentage_expenditure):diphtheria                             -2.966e-03
## log1p(percentage_expenditure):schooling                              -8.020e-02
## log1p(percentage_expenditure):I(schooling^2)                          3.075e-03
## log1p(measles):log1p(bmi)                                            -5.597e-02
## log1p(measles):log1p(under_five_deaths)                              -1.688e-01
## log1p(measles):log1p(gdp)                                             8.083e-02
## log1p(measles):thinness_1_19_years                                    2.588e-02
## log1p(measles):income_composition_of_resources                       -1.412e+00
## log1p(measles):I(income_composition_of_resources^2)                   1.007e+00
## log1p(bmi):diphtheria                                                -2.491e-02
## log1p(bmi):log1p(hiv_aids)                                           -2.893e-01
## log1p(bmi):I(schooling^2)                                            -5.649e-03
## log1p(under_five_deaths):log1p(polio)                                -1.824e-01
## log1p(under_five_deaths):diphtheria                                   5.341e-03
## log1p(under_five_deaths):log1p(hiv_aids)                             -2.974e+00
## log1p(under_five_deaths):log1p(gdp)                                   5.394e-01
## log1p(under_five_deaths):thinness_1_19_years                         -5.154e-02
## log1p(under_five_deaths):income_composition_of_resources              3.119e+00
## log1p(under_five_deaths):schooling                                   -1.764e+00
## log1p(under_five_deaths):I(schooling^2)                               7.728e-02
## log1p(polio):schooling                                               -5.213e-01
## log1p(polio):I(schooling^2)                                           1.963e-02
## diphtheria:log1p(hiv_aids)                                           -1.230e-02
## diphtheria:log1p(gdp)                                                 8.433e-03
## log1p(hiv_aids):log1p(gdp)                                           -2.744e-01
## log1p(hiv_aids):thinness_1_19_years                                  -5.530e-02
## log1p(hiv_aids):income_composition_of_resources                      -1.138e+01
## log1p(hiv_aids):I(income_composition_of_resources^2)                  1.027e+01
## log1p(hiv_aids):I(schooling^2)                                        1.421e-02
## log1p(gdp):income_composition_of_resources                           -8.935e-01
## thinness_1_19_years:income_composition_of_resources                   1.292e+00
## thinness_1_19_years:I(income_composition_of_resources^2)             -1.650e+00
## income_composition_of_resources:I(income_composition_of_resources^2) -1.246e+02
## income_composition_of_resources:I(schooling^2)                       -4.184e-01
## I(income_composition_of_resources^2):I(schooling^2)                   4.918e-01
## schooling:I(schooling^2)                                             -1.294e-02
##                                                                      Std. Error
## (Intercept)                                                           2.727e+01
## statusDeveloping                                                      2.629e+01
## log1p(adult_mortality)                                                6.847e-01
## log1p(infant_deaths)                                                  5.111e+00
## log1p(percentage_expenditure)                                         3.508e-01
## log1p(measles)                                                        2.748e-01
## log1p(bmi)                                                            4.876e-01
## log1p(under_five_deaths)                                              4.721e+00
## log1p(polio)                                                          1.209e+00
## diphtheria                                                            2.330e-02
## log1p(hiv_aids)                                                       1.133e+00
## log1p(gdp)                                                            3.964e-01
## thinness_1_19_years                                                   2.591e-01
## income_composition_of_resources                                       7.402e+01
## I(income_composition_of_resources^2)                                  5.257e+01
## schooling                                                             2.005e+00
## I(schooling^2)                                                        1.047e-01
## statusDeveloping:log1p(adult_mortality)                               2.383e-01
## statusDeveloping:log1p(infant_deaths)                                 2.908e-01
## statusDeveloping:log1p(measles)                                       9.554e-02
## statusDeveloping:thinness_1_19_years                                  2.339e-01
## statusDeveloping:income_composition_of_resources                      7.116e+01
## statusDeveloping:I(income_composition_of_resources^2)                 4.313e+01
## statusDeveloping:schooling                                            1.625e+00
## statusDeveloping:I(schooling^2)                                       5.283e-02
## log1p(adult_mortality):log1p(measles)                                 2.300e-02
## log1p(adult_mortality):log1p(hiv_aids)                                9.656e-02
## log1p(adult_mortality):log1p(gdp)                                     5.278e-02
## log1p(adult_mortality):thinness_1_19_years                            1.826e-02
## log1p(adult_mortality):income_composition_of_resources                1.488e+00
## log1p(adult_mortality):I(income_composition_of_resources^2)           1.769e+00
## log1p(adult_mortality):schooling                                      4.487e-02
## log1p(infant_deaths):log1p(percentage_expenditure)                    2.414e-02
## log1p(infant_deaths):log1p(measles)                                   1.769e-01
## log1p(infant_deaths):log1p(hiv_aids)                                  1.048e+00
## log1p(infant_deaths):log1p(gdp)                                       3.363e-01
## log1p(infant_deaths):schooling                                        8.173e-01
## log1p(infant_deaths):I(schooling^2)                                   3.412e-02
## log1p(percentage_expenditure):log1p(measles)                          1.054e-02
## log1p(percentage_expenditure):log1p(bmi)                              3.649e-02
## log1p(percentage_expenditure):log1p(polio)                            5.587e-02
## log1p(percentage_expenditure):diphtheria                              1.460e-03
## log1p(percentage_expenditure):schooling                               4.126e-02
## log1p(percentage_expenditure):I(schooling^2)                          1.684e-03
## log1p(measles):log1p(bmi)                                             3.135e-02
## log1p(measles):log1p(under_five_deaths)                               1.716e-01
## log1p(measles):log1p(gdp)                                             1.821e-02
## log1p(measles):thinness_1_19_years                                    8.117e-03
## log1p(measles):income_composition_of_resources                        5.467e-01
## log1p(measles):I(income_composition_of_resources^2)                   5.585e-01
## log1p(bmi):diphtheria                                                 4.527e-03
## log1p(bmi):log1p(hiv_aids)                                            1.564e-01
## log1p(bmi):I(schooling^2)                                             1.617e-03
## log1p(under_five_deaths):log1p(polio)                                 8.547e-02
## log1p(under_five_deaths):diphtheria                                   2.223e-03
## log1p(under_five_deaths):log1p(hiv_aids)                              1.005e+00
## log1p(under_five_deaths):log1p(gdp)                                   3.198e-01
## log1p(under_five_deaths):thinness_1_19_years                          1.363e-02
## log1p(under_five_deaths):income_composition_of_resources              5.717e-01
## log1p(under_five_deaths):schooling                                    7.350e-01
## log1p(under_five_deaths):I(schooling^2)                               3.068e-02
## log1p(polio):schooling                                                1.960e-01
## log1p(polio):I(schooling^2)                                           8.841e-03
## diphtheria:log1p(hiv_aids)                                            4.433e-03
## diphtheria:log1p(gdp)                                                 2.214e-03
## log1p(hiv_aids):log1p(gdp)                                            7.960e-02
## log1p(hiv_aids):thinness_1_19_years                                   2.660e-02
## log1p(hiv_aids):income_composition_of_resources                       3.544e+00
## log1p(hiv_aids):I(income_composition_of_resources^2)                  4.594e+00
## log1p(hiv_aids):I(schooling^2)                                        4.772e-03
## log1p(gdp):income_composition_of_resources                            2.998e-01
## thinness_1_19_years:income_composition_of_resources                   3.419e-01
## thinness_1_19_years:I(income_composition_of_resources^2)              4.217e-01
## income_composition_of_resources:I(income_composition_of_resources^2)  2.101e+01
## income_composition_of_resources:I(schooling^2)                        5.080e-02
## I(income_composition_of_resources^2):I(schooling^2)                   6.618e-02
## schooling:I(schooling^2)                                              2.406e-03
##                                                                      t value
## (Intercept)                                                            0.353
## statusDeveloping                                                       0.873
## log1p(adult_mortality)                                                 2.872
## log1p(infant_deaths)                                                  -0.918
## log1p(percentage_expenditure)                                          3.373
## log1p(measles)                                                         2.635
## log1p(bmi)                                                             5.371
## log1p(under_five_deaths)                                               0.491
## log1p(polio)                                                           3.591
## diphtheria                                                             2.052
## log1p(hiv_aids)                                                        0.266
## log1p(gdp)                                                             0.748
## thinness_1_19_years                                                   -6.973
## income_composition_of_resources                                        1.925
## I(income_composition_of_resources^2)                                   0.548
## schooling                                                             -1.761
## I(schooling^2)                                                         4.015
## statusDeveloping:log1p(adult_mortality)                               -1.604
## statusDeveloping:log1p(infant_deaths)                                  3.179
## statusDeveloping:log1p(measles)                                       -4.791
## statusDeveloping:thinness_1_19_years                                   6.299
## statusDeveloping:income_composition_of_resources                      -1.949
## statusDeveloping:I(income_composition_of_resources^2)                  1.936
## statusDeveloping:schooling                                             2.240
## statusDeveloping:I(schooling^2)                                       -1.730
## log1p(adult_mortality):log1p(measles)                                 -2.268
## log1p(adult_mortality):log1p(hiv_aids)                                 2.690
## log1p(adult_mortality):log1p(gdp)                                     -2.229
## log1p(adult_mortality):thinness_1_19_years                             2.825
## log1p(adult_mortality):income_composition_of_resources                -2.946
## log1p(adult_mortality):I(income_composition_of_resources^2)            3.609
## log1p(adult_mortality):schooling                                      -2.602
## log1p(infant_deaths):log1p(percentage_expenditure)                     2.311
## log1p(infant_deaths):log1p(measles)                                    0.983
## log1p(infant_deaths):log1p(hiv_aids)                                   3.146
## log1p(infant_deaths):log1p(gdp)                                       -1.971
## log1p(infant_deaths):schooling                                         2.393
## log1p(infant_deaths):I(schooling^2)                                   -2.591
## log1p(percentage_expenditure):log1p(measles)                          -4.791
## log1p(percentage_expenditure):log1p(bmi)                               3.601
## log1p(percentage_expenditure):log1p(polio)                            -3.163
## log1p(percentage_expenditure):diphtheria                              -2.032
## log1p(percentage_expenditure):schooling                               -1.944
## log1p(percentage_expenditure):I(schooling^2)                           1.827
## log1p(measles):log1p(bmi)                                             -1.785
## log1p(measles):log1p(under_five_deaths)                               -0.984
## log1p(measles):log1p(gdp)                                              4.438
## log1p(measles):thinness_1_19_years                                     3.188
## log1p(measles):income_composition_of_resources                        -2.582
## log1p(measles):I(income_composition_of_resources^2)                    1.804
## log1p(bmi):diphtheria                                                 -5.503
## log1p(bmi):log1p(hiv_aids)                                            -1.850
## log1p(bmi):I(schooling^2)                                             -3.493
## log1p(under_five_deaths):log1p(polio)                                 -2.134
## log1p(under_five_deaths):diphtheria                                    2.402
## log1p(under_five_deaths):log1p(hiv_aids)                              -2.960
## log1p(under_five_deaths):log1p(gdp)                                    1.687
## log1p(under_five_deaths):thinness_1_19_years                          -3.780
## log1p(under_five_deaths):income_composition_of_resources               5.457
## log1p(under_five_deaths):schooling                                    -2.400
## log1p(under_five_deaths):I(schooling^2)                                2.519
## log1p(polio):schooling                                                -2.659
## log1p(polio):I(schooling^2)                                            2.221
## diphtheria:log1p(hiv_aids)                                            -2.776
## diphtheria:log1p(gdp)                                                  3.809
## log1p(hiv_aids):log1p(gdp)                                            -3.447
## log1p(hiv_aids):thinness_1_19_years                                   -2.079
## log1p(hiv_aids):income_composition_of_resources                       -3.211
## log1p(hiv_aids):I(income_composition_of_resources^2)                   2.236
## log1p(hiv_aids):I(schooling^2)                                         2.978
## log1p(gdp):income_composition_of_resources                            -2.980
## thinness_1_19_years:income_composition_of_resources                    3.778
## thinness_1_19_years:I(income_composition_of_resources^2)              -3.913
## income_composition_of_resources:I(income_composition_of_resources^2)  -5.928
## income_composition_of_resources:I(schooling^2)                        -8.235
## I(income_composition_of_resources^2):I(schooling^2)                    7.431
## schooling:I(schooling^2)                                              -5.380
##                                                                      Pr(>|t|)
## (Intercept)                                                          0.723745
## statusDeveloping                                                     0.382500
## log1p(adult_mortality)                                               0.004114
## log1p(infant_deaths)                                                 0.358527
## log1p(percentage_expenditure)                                        0.000755
## log1p(measles)                                                       0.008466
## log1p(bmi)                                                           8.53e-08
## log1p(under_five_deaths)                                             0.623119
## log1p(polio)                                                         0.000335
## diphtheria                                                           0.040240
## log1p(hiv_aids)                                                      0.790149
## log1p(gdp)                                                           0.454547
## thinness_1_19_years                                                  3.94e-12
## income_composition_of_resources                                      0.054392
## I(income_composition_of_resources^2)                                 0.583812
## schooling                                                            0.078321
## I(schooling^2)                                                       6.12e-05
## statusDeveloping:log1p(adult_mortality)                              0.108780
## statusDeveloping:log1p(infant_deaths)                                0.001495
## statusDeveloping:log1p(measles)                                      1.75e-06
## statusDeveloping:thinness_1_19_years                                 3.52e-10
## statusDeveloping:income_composition_of_resources                     0.051393
## statusDeveloping:I(income_composition_of_resources^2)                0.053019
## statusDeveloping:schooling                                           0.025161
## statusDeveloping:I(schooling^2)                                      0.083679
## log1p(adult_mortality):log1p(measles)                                0.023387
## log1p(adult_mortality):log1p(hiv_aids)                               0.007188
## log1p(adult_mortality):log1p(gdp)                                    0.025918
## log1p(adult_mortality):thinness_1_19_years                           0.004770
## log1p(adult_mortality):income_composition_of_resources               0.003250
## log1p(adult_mortality):I(income_composition_of_resources^2)          0.000313
## log1p(adult_mortality):schooling                                     0.009333
## log1p(infant_deaths):log1p(percentage_expenditure)                   0.020921
## log1p(infant_deaths):log1p(measles)                                  0.325692
## log1p(infant_deaths):log1p(hiv_aids)                                 0.001672
## log1p(infant_deaths):log1p(gdp)                                      0.048846
## log1p(infant_deaths):schooling                                       0.016779
## log1p(infant_deaths):I(schooling^2)                                  0.009616
## log1p(percentage_expenditure):log1p(measles)                         1.75e-06
## log1p(percentage_expenditure):log1p(bmi)                             0.000322
## log1p(percentage_expenditure):log1p(polio)                           0.001581
## log1p(percentage_expenditure):diphtheria                             0.042294
## log1p(percentage_expenditure):schooling                              0.052047
## log1p(percentage_expenditure):I(schooling^2)                         0.067883
## log1p(measles):log1p(bmi)                                            0.074306
## log1p(measles):log1p(under_five_deaths)                              0.325237
## log1p(measles):log1p(gdp)                                            9.45e-06
## log1p(measles):thinness_1_19_years                                   0.001448
## log1p(measles):income_composition_of_resources                       0.009875
## log1p(measles):I(income_composition_of_resources^2)                  0.071421
## log1p(bmi):diphtheria                                                4.11e-08
## log1p(bmi):log1p(hiv_aids)                                           0.064495
## log1p(bmi):I(schooling^2)                                            0.000485
## log1p(under_five_deaths):log1p(polio)                                0.032958
## log1p(under_five_deaths):diphtheria                                  0.016361
## log1p(under_five_deaths):log1p(hiv_aids)                             0.003104
## log1p(under_five_deaths):log1p(gdp)                                  0.091790
## log1p(under_five_deaths):thinness_1_19_years                         0.000160
## log1p(under_five_deaths):income_composition_of_resources             5.32e-08
## log1p(under_five_deaths):schooling                                   0.016463
## log1p(under_five_deaths):I(schooling^2)                              0.011824
## log1p(polio):schooling                                               0.007878
## log1p(polio):I(schooling^2)                                          0.026447
## diphtheria:log1p(hiv_aids)                                           0.005548
## diphtheria:log1p(gdp)                                                0.000143
## log1p(hiv_aids):log1p(gdp)                                           0.000576
## log1p(hiv_aids):thinness_1_19_years                                  0.037737
## log1p(hiv_aids):income_composition_of_resources                      0.001338
## log1p(hiv_aids):I(income_composition_of_resources^2)                 0.025438
## log1p(hiv_aids):I(schooling^2)                                       0.002927
## log1p(gdp):income_composition_of_resources                           0.002911
## thinness_1_19_years:income_composition_of_resources                  0.000162
## thinness_1_19_years:I(income_composition_of_resources^2)             9.34e-05
## income_composition_of_resources:I(income_composition_of_resources^2) 3.48e-09
## income_composition_of_resources:I(schooling^2)                       2.82e-16
## I(income_composition_of_resources^2):I(schooling^2)                  1.46e-13
## schooling:I(schooling^2)                                             8.13e-08
##                                                                         
## (Intercept)                                                             
## statusDeveloping                                                        
## log1p(adult_mortality)                                               ** 
## log1p(infant_deaths)                                                    
## log1p(percentage_expenditure)                                        ***
## log1p(measles)                                                       ** 
## log1p(bmi)                                                           ***
## log1p(under_five_deaths)                                                
## log1p(polio)                                                         ***
## diphtheria                                                           *  
## log1p(hiv_aids)                                                         
## log1p(gdp)                                                              
## thinness_1_19_years                                                  ***
## income_composition_of_resources                                      .  
## I(income_composition_of_resources^2)                                    
## schooling                                                            .  
## I(schooling^2)                                                       ***
## statusDeveloping:log1p(adult_mortality)                                 
## statusDeveloping:log1p(infant_deaths)                                ** 
## statusDeveloping:log1p(measles)                                      ***
## statusDeveloping:thinness_1_19_years                                 ***
## statusDeveloping:income_composition_of_resources                     .  
## statusDeveloping:I(income_composition_of_resources^2)                .  
## statusDeveloping:schooling                                           *  
## statusDeveloping:I(schooling^2)                                      .  
## log1p(adult_mortality):log1p(measles)                                *  
## log1p(adult_mortality):log1p(hiv_aids)                               ** 
## log1p(adult_mortality):log1p(gdp)                                    *  
## log1p(adult_mortality):thinness_1_19_years                           ** 
## log1p(adult_mortality):income_composition_of_resources               ** 
## log1p(adult_mortality):I(income_composition_of_resources^2)          ***
## log1p(adult_mortality):schooling                                     ** 
## log1p(infant_deaths):log1p(percentage_expenditure)                   *  
## log1p(infant_deaths):log1p(measles)                                     
## log1p(infant_deaths):log1p(hiv_aids)                                 ** 
## log1p(infant_deaths):log1p(gdp)                                      *  
## log1p(infant_deaths):schooling                                       *  
## log1p(infant_deaths):I(schooling^2)                                  ** 
## log1p(percentage_expenditure):log1p(measles)                         ***
## log1p(percentage_expenditure):log1p(bmi)                             ***
## log1p(percentage_expenditure):log1p(polio)                           ** 
## log1p(percentage_expenditure):diphtheria                             *  
## log1p(percentage_expenditure):schooling                              .  
## log1p(percentage_expenditure):I(schooling^2)                         .  
## log1p(measles):log1p(bmi)                                            .  
## log1p(measles):log1p(under_five_deaths)                                 
## log1p(measles):log1p(gdp)                                            ***
## log1p(measles):thinness_1_19_years                                   ** 
## log1p(measles):income_composition_of_resources                       ** 
## log1p(measles):I(income_composition_of_resources^2)                  .  
## log1p(bmi):diphtheria                                                ***
## log1p(bmi):log1p(hiv_aids)                                           .  
## log1p(bmi):I(schooling^2)                                            ***
## log1p(under_five_deaths):log1p(polio)                                *  
## log1p(under_five_deaths):diphtheria                                  *  
## log1p(under_five_deaths):log1p(hiv_aids)                             ** 
## log1p(under_five_deaths):log1p(gdp)                                  .  
## log1p(under_five_deaths):thinness_1_19_years                         ***
## log1p(under_five_deaths):income_composition_of_resources             ***
## log1p(under_five_deaths):schooling                                   *  
## log1p(under_five_deaths):I(schooling^2)                              *  
## log1p(polio):schooling                                               ** 
## log1p(polio):I(schooling^2)                                          *  
## diphtheria:log1p(hiv_aids)                                           ** 
## diphtheria:log1p(gdp)                                                ***
## log1p(hiv_aids):log1p(gdp)                                           ***
## log1p(hiv_aids):thinness_1_19_years                                  *  
## log1p(hiv_aids):income_composition_of_resources                      ** 
## log1p(hiv_aids):I(income_composition_of_resources^2)                 *  
## log1p(hiv_aids):I(schooling^2)                                       ** 
## log1p(gdp):income_composition_of_resources                           ** 
## thinness_1_19_years:income_composition_of_resources                  ***
## thinness_1_19_years:I(income_composition_of_resources^2)             ***
## income_composition_of_resources:I(income_composition_of_resources^2) ***
## income_composition_of_resources:I(schooling^2)                       ***
## I(income_composition_of_resources^2):I(schooling^2)                  ***
## schooling:I(schooling^2)                                             ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.006 on 2558 degrees of freedom
## Multiple R-squared:  0.903,  Adjusted R-squared:  0.9001 
## F-statistic: 313.3 on 76 and 2558 DF,  p-value: < 2.2e-16

Diagnostic:

RMSE:

## [1] 2.981839

BIC model based on full additive model:

## Start:  AIC=7388.81
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - thinness_5_9_years               1       0.4 40985 7381.0
## - total_expenditure                1       1.7 40987 7381.0
## - population                       1       1.8 40987 7381.1
## - hepatitis_b                      1       4.0 40989 7381.2
## - alcohol                          1      20.0 41005 7382.2
## - thinness_1_19_years              1      32.9 41018 7383.1
## - measles                          1      57.6 41042 7384.6
## - percentage_expenditure           1      63.6 41048 7385.0
## - gdp                              1      96.5 41081 7387.1
## <none>                                         40985 7388.8
## - status                           1     301.6 41286 7400.3
## - polio                            1     454.4 41439 7410.0
## - diphtheria                       1     671.8 41657 7423.8
## - bmi                              1     810.8 41796 7432.6
## - income_composition_of_resources  1    1447.7 42432 7472.4
## - infant_deaths                    1    1685.8 42671 7487.2
## - under_five_deaths                1    1752.6 42737 7491.3
## - schooling                        1    4192.3 45177 7637.6
## - adult_mortality                  1    7568.8 48554 7827.5
## - hiv_aids                         1   11180.8 52166 8016.6
## 
## Step:  AIC=7380.96
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - total_expenditure                1       1.8 40987 7373.2
## - population                       1       1.9 40987 7373.2
## - hepatitis_b                      1       4.0 40989 7373.3
## - alcohol                          1      20.2 41005 7374.4
## - measles                          1      57.4 41043 7376.8
## - percentage_expenditure           1      63.6 41049 7377.2
## - gdp                              1      96.6 41082 7379.3
## <none>                                         40985 7381.0
## - thinness_1_19_years              1     161.3 41147 7383.4
## - status                           1     302.2 41287 7392.4
## - polio                            1     455.1 41440 7402.2
## - diphtheria                       1     671.4 41657 7415.9
## - bmi                              1     827.4 41813 7425.7
## - income_composition_of_resources  1    1447.4 42433 7464.5
## - infant_deaths                    1    1688.7 42674 7479.5
## - under_five_deaths                1    1753.8 42739 7483.5
## - schooling                        1    4192.5 45178 7629.7
## - adult_mortality                  1    7580.1 48565 7820.2
## - hiv_aids                         1   11196.6 52182 8009.5
## 
## Step:  AIC=7373.2
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + diphtheria + hiv_aids + 
##     gdp + population + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - population                       1       1.8 40989 7365.4
## - hepatitis_b                      1       3.8 40991 7365.6
## - alcohol                          1      21.4 41008 7366.7
## - measles                          1      58.4 41045 7369.1
## - percentage_expenditure           1      64.3 41051 7369.5
## - gdp                              1      96.0 41083 7371.5
## <none>                                         40987 7373.2
## - thinness_1_19_years              1     165.4 41152 7375.9
## - status                           1     312.5 41299 7385.3
## - polio                            1     455.1 41442 7394.4
## - diphtheria                       1     672.9 41660 7408.2
## - bmi                              1     837.1 41824 7418.6
## - income_composition_of_resources  1    1449.7 42437 7456.9
## - infant_deaths                    1    1689.2 42676 7471.7
## - under_five_deaths                1    1754.1 42741 7475.7
## - schooling                        1    4244.6 45232 7625.0
## - adult_mortality                  1    7582.3 48569 7812.6
## - hiv_aids                         1   11241.1 52228 8004.0
## 
## Step:  AIC=7365.44
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + diphtheria + hiv_aids + 
##     gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - hepatitis_b                      1       4.1 40993 7357.8
## - alcohol                          1      21.5 41010 7358.9
## - measles                          1      60.7 41049 7361.5
## - percentage_expenditure           1      64.1 41053 7361.7
## - gdp                              1      96.3 41085 7363.7
## <none>                                         40989 7365.4
## - thinness_1_19_years              1     165.5 41154 7368.2
## - status                           1     311.4 41300 7377.5
## - polio                            1     454.7 41444 7386.6
## - diphtheria                       1     677.6 41666 7400.8
## - bmi                              1     838.3 41827 7410.9
## - income_composition_of_resources  1    1449.4 42438 7449.1
## - infant_deaths                    1    1746.5 42735 7467.5
## - under_five_deaths                1    1779.2 42768 7469.5
## - schooling                        1    4257.0 45246 7617.9
## - adult_mortality                  1    7588.0 48577 7805.1
## - hiv_aids                         1   11242.8 52232 7996.3
## 
## Step:  AIC=7357.83
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + diphtheria + hiv_aids + gdp + thinness_1_19_years + 
##     income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - alcohol                          1      22.8 41016 7351.4
## - measles                          1      60.6 41054 7353.8
## - percentage_expenditure           1      67.1 41060 7354.3
## - gdp                              1      95.1 41088 7356.1
## <none>                                         40993 7357.8
## - thinness_1_19_years              1     168.9 41162 7360.8
## - status                           1     309.6 41303 7369.8
## - polio                            1     452.7 41446 7378.9
## - diphtheria                       1     751.6 41745 7397.8
## - bmi                              1     837.4 41830 7403.2
## - income_composition_of_resources  1    1455.8 42449 7441.9
## - infant_deaths                    1    1761.7 42755 7460.8
## - under_five_deaths                1    1790.2 42783 7462.6
## - schooling                        1    4260.0 45253 7610.5
## - adult_mortality                  1    7592.3 48585 7797.7
## - hiv_aids                         1   11243.2 52236 7988.6
## 
## Step:  AIC=7351.42
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + measles + bmi + under_five_deaths + 
##     polio + diphtheria + hiv_aids + gdp + thinness_1_19_years + 
##     income_composition_of_resources + schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - measles                          1      59.5 41075 7347.4
## - percentage_expenditure           1      71.1 41087 7348.1
## - gdp                              1      90.9 41107 7349.4
## <none>                                         41016 7351.4
## - thinness_1_19_years              1     206.0 41222 7356.7
## - status                           1     454.9 41471 7372.6
## - polio                            1     456.9 41473 7372.7
## - diphtheria                       1     755.6 41771 7391.6
## - bmi                              1     837.9 41854 7396.8
## - income_composition_of_resources  1    1458.8 42475 7435.6
## - infant_deaths                    1    1739.5 42755 7453.0
## - under_five_deaths                1    1768.9 42785 7454.8
## - schooling                        1    4642.0 45658 7626.1
## - adult_mortality                  1    7579.6 48595 7790.4
## - hiv_aids                         1   11239.2 52255 7981.7
## 
## Step:  AIC=7347.36
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     percentage_expenditure + bmi + under_five_deaths + polio + 
##     diphtheria + hiv_aids + gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## - percentage_expenditure           1      70.7 41146 7344.0
## - gdp                              1      91.7 41167 7345.4
## <none>                                         41075 7347.4
## - thinness_1_19_years              1     194.8 41270 7352.0
## - status                           1     459.5 41535 7368.8
## - polio                            1     460.4 41536 7368.9
## - diphtheria                       1     760.3 41836 7387.8
## - bmi                              1     870.1 41945 7394.7
## - income_composition_of_resources  1    1481.9 42557 7432.9
## - infant_deaths                    1    1806.3 42882 7452.9
## - under_five_deaths                1    1884.9 42960 7457.7
## - schooling                        1    4626.3 45702 7620.7
## - adult_mortality                  1    7522.2 48597 7782.6
## - hiv_aids                         1   11284.7 52360 7979.1
## 
## Step:  AIC=7344.01
## life_expectancy ~ status + adult_mortality + infant_deaths + 
##     bmi + under_five_deaths + polio + diphtheria + hiv_aids + 
##     gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## 
##                                   Df Sum of Sq   RSS    AIC
## <none>                                         41146 7344.0
## - thinness_1_19_years              1     206.0 41352 7349.3
## - polio                            1     444.8 41591 7364.5
## - status                           1     503.5 41649 7368.2
## - diphtheria                       1     752.2 41898 7383.9
## - gdp                              1     831.5 41977 7388.9
## - bmi                              1     841.6 41988 7389.5
## - income_composition_of_resources  1    1456.5 42602 7427.8
## - infant_deaths                    1    1805.9 42952 7449.3
## - under_five_deaths                1    1883.0 43029 7454.0
## - schooling                        1    4672.3 45818 7619.6
## - adult_mortality                  1    7509.2 48655 7777.8
## - hiv_aids                         1   11241.1 52387 7972.6
## 
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths + 
##     bmi + under_five_deaths + polio + diphtheria + hiv_aids + 
##     gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.4119  -2.3041  -0.1221   2.2630  17.7790 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      5.533e+01  6.242e-01  88.639  < 2e-16 ***
## statusDeveloping                -1.450e+00  2.560e-01  -5.664 1.64e-08 ***
## adult_mortality                 -1.797e-02  8.217e-04 -21.875  < 2e-16 ***
## infant_deaths                    9.140e-02  8.520e-03  10.727  < 2e-16 ***
## bmi                              3.735e-02  5.100e-03   7.323 3.21e-13 ***
## under_five_deaths               -6.851e-02  6.254e-03 -10.954  < 2e-16 ***
## polio                            2.484e-02  4.665e-03   5.324 1.10e-07 ***
## diphtheria                       3.214e-02  4.642e-03   6.923 5.52e-12 ***
## hiv_aids                        -4.718e-01  1.763e-02 -26.764  < 2e-16 ***
## gdp                              4.962e-05  6.817e-06   7.279 4.42e-13 ***
## thinness_1_19_years             -8.638e-02  2.384e-02  -3.623 0.000296 ***
## income_composition_of_resources  6.278e+00  6.517e-01   9.634  < 2e-16 ***
## schooling                        7.521e-01  4.359e-02  17.255  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.961 on 2622 degrees of freedom
## Multiple R-squared:  0.8273, Adjusted R-squared:  0.8265 
## F-statistic:  1047 on 12 and 2622 DF,  p-value: < 2.2e-16

## Analysis of Variance Table
## 
## Model 1: life_expectancy ~ status + adult_mortality + infant_deaths + 
##     bmi + under_five_deaths + polio + diphtheria + hiv_aids + 
##     gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## Model 2: life_expectancy ~ status + adult_mortality + infant_deaths + 
##     alcohol + percentage_expenditure + hepatitis_b + measles + 
##     bmi + under_five_deaths + polio + total_expenditure + diphtheria + 
##     hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years + 
##     income_composition_of_resources + schooling
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1   2622 41146                           
## 2   2615 40985  7     161.1 1.4684 0.1738
## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(bmi) + log1p(under_five_deaths) + 
##     log1p(polio) + diphtheria + log1p(hiv_aids) + gdp + thinness_1_19_years + 
##     income_composition_of_resources + schooling, data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.7870  -2.1306  -0.1575   2.2313  13.0730 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      6.350e+01  9.265e-01  68.541  < 2e-16 ***
## statusDeveloping                -1.603e+00  2.391e-01  -6.705 2.46e-11 ***
## log1p(adult_mortality)          -6.689e-01  7.785e-02  -8.592  < 2e-16 ***
## log1p(infant_deaths)             4.109e+00  5.492e-01   7.483 9.89e-14 ***
## log1p(bmi)                       1.547e-01  1.121e-01   1.379 0.167952    
## log1p(under_five_deaths)        -4.635e+00  5.248e-01  -8.832  < 2e-16 ***
## log1p(polio)                     1.722e-01  1.501e-01   1.148 0.251138    
## diphtheria                       2.955e-02  3.970e-03   7.444 1.31e-13 ***
## log1p(hiv_aids)                 -5.291e+00  1.175e-01 -45.026  < 2e-16 ***
## gdp                              4.603e-05  6.352e-06   7.246 5.63e-13 ***
## thinness_1_19_years             -6.928e-02  2.042e-02  -3.392 0.000703 ***
## income_composition_of_resources  7.603e+00  6.046e-01  12.576  < 2e-16 ***
## schooling                        5.155e-01  4.155e-02  12.407  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.686 on 2622 degrees of freedom
## Multiple R-squared:  0.8505, Adjusted R-squared:  0.8498 
## F-statistic:  1243 on 12 and 2622 DF,  p-value: < 2.2e-16

## Analysis of Variance Table
## 
## Model 1: life_expectancy ~ status + log1p(adult_mortality) + log1p(infant_deaths) + 
##     log1p(percentage_expenditure) + log1p(measles) + log1p(bmi) + 
##     log1p(under_five_deaths) + log1p(polio) + diphtheria + log1p(hiv_aids) + 
##     gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling
## Model 2: life_expectancy ~ status + log1p(adult_mortality) + log1p(infant_deaths) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + gdp + thinness_1_19_years + income_composition_of_resources + 
##     schooling
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1   2620 35243                                  
## 2   2622 35618 -2   -374.94 13.937 9.535e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 3.506641
## [1] 3.509898

Experiment with removing extreme values of life_expectancy in addition of outliers

## [1] 2530
## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2), 
##     data = life_clean1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.6747  -1.9937  -0.1942   1.7893  14.5429 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           66.062497   0.942618  70.084  < 2e-16 ***
## statusDeveloping                      -0.142822   0.228806  -0.624 0.532548    
## log1p(adult_mortality)                -0.523254   0.071300  -7.339 2.89e-13 ***
## log1p(infant_deaths)                   3.272498   0.489964   6.679 2.95e-11 ***
## log1p(percentage_expenditure)          0.094003   0.027201   3.456 0.000557 ***
## log1p(measles)                        -0.048781   0.026896  -1.814 0.069851 .  
## log1p(bmi)                            -0.049928   0.099689  -0.501 0.616532    
## log1p(under_five_deaths)              -3.434809   0.470761  -7.296 3.95e-13 ***
## log1p(polio)                           0.116578   0.135445   0.861 0.389484    
## diphtheria                             0.025991   0.003571   7.278 4.51e-13 ***
## log1p(hiv_aids)                       -4.381996   0.125400 -34.944  < 2e-16 ***
## log1p(gdp)                             0.061701   0.048077   1.283 0.199479    
## thinness_1_19_years                   -0.052576   0.018771  -2.801 0.005135 ** 
## income_composition_of_resources      -20.086726   1.581816 -12.699  < 2e-16 ***
## I(income_composition_of_resources^2)  36.247260   1.944354  18.642  < 2e-16 ***
## schooling                              0.439762   0.103294   4.257 2.14e-05 ***
## I(schooling^2)                        -0.014634   0.005040  -2.903 0.003724 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.222 on 2513 degrees of freedom
## Multiple R-squared:  0.8572, Adjusted R-squared:  0.8563 
## F-statistic: 942.9 on 16 and 2513 DF,  p-value: < 2.2e-16

## [1] 3.548072

Experiment with regsubsets to figure out the best additive model. This technique can be helpful to find a smaller yet performant model.

## Warning: package 'leaps' was built under R version 4.0.2

BIC based log-poly model

## 
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(measles) + log1p(bmi) + log1p(under_five_deaths) + 
##     log1p(polio) + diphtheria + log1p(hiv_aids) + log1p(gdp) + 
##     thinness_1_19_years + income_composition_of_resources + I(income_composition_of_resources^2) + 
##     schooling + I(schooling^2), data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.5833  -2.0267  -0.2112   2.0684  13.9777 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           64.946549   0.963194  67.428  < 2e-16 ***
## statusDeveloping                      -0.072957   0.244468  -0.298   0.7654    
## log1p(adult_mortality)                -0.512811   0.073440  -6.983 3.66e-12 ***
## log1p(infant_deaths)                   3.804791   0.517827   7.348 2.68e-13 ***
## log1p(measles)                        -0.046829   0.028139  -1.664   0.0962 .  
## log1p(bmi)                            -0.024713   0.105474  -0.234   0.8148    
## log1p(under_five_deaths)              -4.040098   0.496874  -8.131 6.50e-16 ***
## log1p(polio)                           0.106877   0.140584   0.760   0.4472    
## diphtheria                             0.028746   0.003733   7.701 1.90e-14 ***
## log1p(hiv_aids)                       -4.795234   0.112532 -42.612  < 2e-16 ***
## log1p(gdp)                             0.112921   0.049645   2.275   0.0230 *  
## thinness_1_19_years                   -0.027437   0.019599  -1.400   0.1616    
## income_composition_of_resources      -17.983107   1.629075 -11.039  < 2e-16 ***
## I(income_composition_of_resources^2)  34.467930   2.014526  17.110  < 2e-16 ***
## schooling                              0.435872   0.107730   4.046 5.36e-05 ***
## I(schooling^2)                        -0.012747   0.005312  -2.400   0.0165 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.452 on 2619 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8683 
## F-statistic:  1158 on 15 and 2619 DF,  p-value: < 2.2e-16

## Analysis of Variance Table
## 
## Model 1: life_expectancy ~ status + log1p(adult_mortality) + log1p(infant_deaths) + 
##     log1p(percentage_expenditure) + log1p(measles) + log1p(bmi) + 
##     log1p(under_five_deaths) + log1p(polio) + diphtheria + log1p(hiv_aids) + 
##     log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2)
## Model 2: life_expectancy ~ status + log1p(adult_mortality) + log1p(infant_deaths) + 
##     log1p(measles) + log1p(bmi) + log1p(under_five_deaths) + 
##     log1p(polio) + diphtheria + log1p(hiv_aids) + log1p(gdp) + 
##     thinness_1_19_years + income_composition_of_resources + I(income_composition_of_resources^2) + 
##     schooling + I(schooling^2)
##   Res.Df   RSS Df Sum of Sq      F   Pr(>F)   
## 1   2618 31109                                
## 2   2619 31201 -1    -91.96 7.7389 0.005443 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 3.431698
## [1] 3.4336

Response LOG transform:

## 
## Call:
## lm(formula = log(life_expectancy) ~ status + log1p(adult_mortality) + 
##     log1p(infant_deaths) + log1p(percentage_expenditure) + log1p(measles) + 
##     log1p(bmi) + log1p(under_five_deaths) + log1p(polio) + diphtheria + 
##     log1p(hiv_aids) + log1p(gdp) + thinness_1_19_years + income_composition_of_resources + 
##     I(income_composition_of_resources^2) + schooling + I(schooling^2), 
##     data = non_cat_predictor_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44846 -0.02864 -0.00191  0.03238  0.19542 
## 
## Coefficients:
##                                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           4.142e+00  1.519e-02 272.599  < 2e-16 ***
## statusDeveloping                      2.830e-03  3.845e-03   0.736  0.46176    
## log1p(adult_mortality)               -6.931e-03  1.157e-03  -5.990 2.39e-09 ***
## log1p(infant_deaths)                  6.366e-02  8.142e-03   7.818 7.72e-15 ***
## log1p(percentage_expenditure)         1.193e-03  4.534e-04   2.632  0.00855 ** 
## log1p(measles)                       -1.028e-03  4.425e-04  -2.323  0.02026 *  
## log1p(bmi)                            3.841e-04  1.659e-03   0.232  0.81688    
## log1p(under_five_deaths)             -6.746e-02  7.813e-03  -8.633  < 2e-16 ***
## log1p(polio)                          1.920e-03  2.212e-03   0.868  0.38537    
## diphtheria                            4.619e-04  5.870e-05   7.868 5.23e-15 ***
## log1p(hiv_aids)                      -8.335e-02  1.786e-03 -46.659  < 2e-16 ***
## log1p(gdp)                            1.422e-03  7.983e-04   1.781  0.07504 .  
## thinness_1_19_years                  -1.314e-04  3.082e-04  -0.426  0.66992    
## income_composition_of_resources      -2.114e-01  2.573e-02  -8.219 3.20e-16 ***
## I(income_composition_of_resources^2)  4.367e-01  3.189e-02  13.693  < 2e-16 ***
## schooling                             8.297e-03  1.695e-03   4.895 1.04e-06 ***
## I(schooling^2)                       -2.566e-04  8.359e-05  -3.070  0.00216 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05427 on 2618 degrees of freedom
## Multiple R-squared:  0.8639, Adjusted R-squared:  0.8631 
## F-statistic:  1038 on 16 and 2618 DF,  p-value: < 2.2e-16

## [1] 3.494317